Department of Bioinformatics and Computational Biology

CanDrA

From MD Anderson Bioinformatics
Jump to: navigation, search

CanDrA

Logo
CanDrA: cancer-specific driver missense mutation annotation with optimized features
Overview
Description CanDrA is a computer program that predicts cancer-type specific driver missense mutations
Development Information
Language Perl
Current Version Plus
Platforms Tested on Centos 5.5 (x86_64) and Ubuntu 10.04LTS (x86 32-bit). It is assumed to work for most of X-NIX system after compilation.
License GPL v3
Status Active
Last Updated 2013/04/05
References
Citations CanDrA: Cancer-Specific Driver Missense Mutation Annotation with Optimized Features

Yong Mao, Han Chen, Han Liang, Funda Meric-Bernstam, Gordon B. Mills, Ken Chen

Published: October 30, 2013 http://dx.doi.org/10.1371/journal.pone.0077945

News Version Plus is available
Help and Support
Contact Ken Chen


CanDrA is a machine learning program that predicts cancer-type specific driver missense mutations based on 96 structural, evolutionary and gene features computed by over 10 other functional prediction algorithms. CanDrA training set is collected based on mining COSMIC database. In version Plus, it can analyze 15 cancer types (Bladder Cancer, Breast Cancer, Colorectal Cancer, Cervical Squamous Cell Carcinoma, Endometrioid Carcinoma, GBM, Kidney Cancer, Lung Adenocarcinoma, Lung Small Cell Carcinoma, Lung Squamous Cell Carcinoma, Medulloblastoma, Malignant Melanoma, Ovarian Cancer, Prostate Cancer, Squamous Cell Skin Cancer). And also it provided an extra model for predicting if a mutation is generally a driver for all cancer. COSMIC version 62 was used for CanDrA Plus.


Contents


Download

The CanDrA package includes two parts, executable files and annotation data files for each specific cancer. To run it, users need to download the executable files and at least 1 annotation data file.

Release History

Version 1.0

In CanDrA v1.0, it used 95 structural and evolutionary features to build predictive models and can analyze 6 cancer types: Glioblastoma multiforme (GBM), Ovarian Cancer, Breast Cancer, Colorectal Cancer, Malignant Melanoma, Squamous Cell Skin Cancer. Training set was collected according to COSMIC version 58.
Executable Package
* CanDrA.v1.0
Annotation Data Files for Specific Cancer
* Breast Cancer
* Colorectal Cancer
* Glioblastoma Multiforme
* Malignant Melanoma
* Ovarian Cancer
* Squamous Cell Skin Cancer
Genome Biology paper that compared CanDrA and several other Cancer Driver predictors
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0484-1

Version (Plus), obsolete, variant significance weighted by gene level significance

Executable Package
* CanDrA.v+
Annotation Data Files for Specific Cancer
* Bladder Cancer
* Breast Cancer
* Colorectal Cancer
* Cervical Squamous Cell Carcinoma
* Endometrioid Carcinoma
* Glioblastoma Multiforme
* Kidney Cancer
* Lung Adenocarcinoma
* Lung Small Cell Carcinoma
* Lung Squamous Cell Carcinoma
* Medulloblastoma
* Malignant Melanoma
* Ovarian Cancer
* Prostate Cancer
* Squamous Cell Skin Cancer
* Cancer-in-General

Configuration

How to setup CanDrA?
    (1). Download and unpack the CanDrA package.
    (2). Compile tabix toolkit in ./tabix-0.2.6 under the top directory of CanDrA: "cd ./tabix-0.2.6; make". If there are any questions about setting up tabix, please refer to the webpage http://samtools.sourceforge.net/tabix.shtml.
    (3). Download the corresponding cancer-type specific annotation data file, e.g. OVC.tar.bz2, unpack it "tar -vxjf OVC.tar.bz2", and place the unpacked directory ./OVC, into the ./database directory.
    (4). Run "perl open_candra.pl option input_file > output_file" for testing.
What is the format for an input file?
Columns of an input file are
(1). chromosome number;
(2). genomic_coordinate;
(3). ref_allele;
(4). mutated_allele;
(5). strand.
A input file should be in a tab-delimited format. More please refer to demo_input.txt in the package;
What is the format of an output file?
An output file is in a tab-delimited format; Columns of an output file are
(1). chromosome number;
(2). genomic_coordinate;
(3). ref_allele;
(4). mutated_allele;
(5). strand;
(6). HGNC symbol;
(7). Refseq mRNA;
(8). qualitative mutation consequence;
(9). AAS - Amino Acid Substitution;
(10). AAS Location;
(11). CanDrA score;
(12). CanDrA output category;
(13). Significance of CanDrA score.
How to use CanDrA program?
     Version:       Plus
     Usage:         perl open_candra.pl <cancer_type> input_file > output_file
     Options:       Supported <cancer_type>: BLCA, BRCA, CRC, CSCC, EC, GBM, GENERAL, KIRC, LUAD, LUSMACC, LUSQUCC, MDB, MEL, OVC, PRCA, SCSC
     Example:       perl open_candra.pl OVC demo_input.txt > demo.annotated

System Requirements

To annotate a missense mutation under a specific cancer background, the package needs around 1 gigabyte storage to store the corresponding annotation database file.

FAQ

Q: What type of reference genome does CanDrA support?
A: For version 1.0, it only supports hg19 (NCBI Build 37).
Q: When I did the query, I saw messages as "[tabix] the index file either does not exist or is older than the vcf file. Please reindex." in the results. Where is wrong?
A: It means the tabix index file (*.tbi) did not function well on your platform as it is different from our testing platforms. The solution is to run '/Path/to/CanDrA*/tabix-0.2.6/tabix -f -s 1 -b 2 -e 2 sorted_CanDrA_*.gz' in the /Path/to/CanDrA*/database/* directory, where * is a specific cancer type you are using. The purpose is to generate a new sorted_CanDrA_*.gz.tbi for replacing the old one.