CanDrA
Overview | |
Description | Cancer-specific Diver missense mutation Annotation with optimized features |
Development Information | |
Language | Perl |
Current version | Plus |
Platforms | Tested on Centos 5.5 (x86_64) and Ubuntu 10.04LTS (x86 32-bit). It is assumed to work for most of X-NIX system after compilation. |
License | GPL v3 |
Status | Active |
Last updated | 2013/04/05 |
News | Version Plus is available |
References | |
Citation | Mao, Y., Chen, H., Liang, H. et al., CanDrA: Cancer-Specific Driver Missense Mutation Annotation with Optimized Features, PLOS ONE (2013) https://doi.org/10.1371/journal.pone.0077945 |
Help and Support | |
Contact | Ken Chen |
CanDrA is a machine learning program that predicts cancer-type specific driver missense mutations based on 96 structural, evolutionary and gene features computed by over 10 other functional prediction algorithms. CanDrA training set is collected based on mining COSMIC database. In version Plus, it can analyze 15 cancer types (Bladder Cancer, Breast Cancer, Colorectal Cancer, Cervical Squamous Cell Carcinoma, Endometrioid Carcinoma, GBM, Kidney Cancer, Lung Adenocarcinoma, Lung Small Cell Carcinoma, Lung Squamous Cell Carcinoma, Medulloblastoma, Malignant Melanoma, Ovarian Cancer, Prostate Cancer, Squamous Cell Skin Cancer). And also it provided an extra model for predicting if a mutation is generally a driver for all cancer. COSMIC version 62 was used for CanDrA Plus.
The CanDrA package includes two parts, executable files and annotation data files for each specific cancer. To run it, users need to download the executable files and at least 1 annotation data file.
Version 1.0
In CanDrA v1.0, it used 95 structural and evolutionary features to build predictive models and can analyze 6 cancer types: Glioblastoma multiforme (GBM), Ovarian Cancer, Breast Cancer, Colorectal Cancer, Malignant Melanoma, Squamous Cell Skin Cancer. Training set was collected according to COSMIC version 58.
Executable Package
Annotation Data Files for Specific Cancer
Genome Biology paper that compared CanDrA and several other Cancer Driver predictors https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0484-1
Version (Plus), obsolete, variant significance weighted by gene level significance
Executable Package
Annotation Data Files for Specific Cancer
How to setup CanDrA?
What is the format for an input file?
Columns of an input file are
A input file should be in a tab-delimited format. More please refer to demo_input.txt in the package;
What is the format of an output file?
An output file is in a tab-delimited format; Columns of an output file are
How to use CanDrA program?
Version: | Plus | |
Usage: | perl open_candra.pl <cancer_type> input_file > output_file | |
Options: | Supported <cancer_type>: BLCA, BRCA, CRC, CSCC, EC, GBM, GENERAL, KIRC, LUAD, LUSMACC, LUSQUCC, MDB, MEL, OVC, PRCA, SCSC | |
Example: | perl open_candra.pl OVC demo_input.txt > demo.annotated |
To annotate a missense mutation under a specific cancer background, the package needs around 1 gigabyte storage to store the corresponding annotation database file.
Q: What type of reference genome does CanDrA support?
A: For version 1.0, it only supports hg19 (NCBI Build 37).
Q: When I did the query, I saw messages as “[tabix] the index file either does not exist or is older than the vcf file. Please reindex.” in the results. Where is wrong?
A: It means the tabix index file (*.tbi) did not function well on your platform as it is different from our testing platforms.
The solution is to run ‘/Path/to/CanDrA*/tabix-0.2.6/tabix -f -s 1 -b 2 -e 2 sorted_CanDrA_*.gz’
in the /Path/to/CanDrA*/database/* directory, where * is a specific cancer type you are using. The purpose is to generate a new sorted_CanDrA_*.gz.tbi for replacing the old one.