Department of Bioinformatics and Computational Biology

Home > Public Software > CanDrA

CanDrA

hidden rowfor table layout
Overview
DescriptionCancer-specific Diver missense mutation Annotation with optimized features
Development Information
LanguagePerl
Current versionPlus
PlatformsTested on Centos 5.5 (x86_64) and Ubuntu 10.04LTS (x86 32-bit). It is assumed to work for most of X-NIX system after compilation.
LicenseGPL v3
StatusActive
Last updated2013/04/05
NewsVersion Plus is available
References
Citation Mao, Y., Chen, H., Liang, H. et al., CanDrA: Cancer-Specific Driver Missense Mutation Annotation with Optimized Features, PLOS ONE (2013) https://doi.org/10.1371/journal.pone.0077945 
Help and Support
Contact Ken Chen 

CanDrA

CanDrA is a machine learning program that predicts cancer-type specific driver missense mutations based on 96 structural, evolutionary and gene features computed by over 10 other functional prediction algorithms. CanDrA training set is collected based on mining COSMIC database. In version Plus, it can analyze 15 cancer types (Bladder Cancer, Breast Cancer, Colorectal Cancer, Cervical Squamous Cell Carcinoma, Endometrioid Carcinoma, GBM, Kidney Cancer, Lung Adenocarcinoma, Lung Small Cell Carcinoma, Lung Squamous Cell Carcinoma, Medulloblastoma, Malignant Melanoma, Ovarian Cancer, Prostate Cancer, Squamous Cell Skin Cancer). And also it provided an extra model for predicting if a mutation is generally a driver for all cancer. COSMIC version 62 was used for CanDrA Plus.

Download

The CanDrA package includes two parts, executable files and annotation data files for each specific cancer. To run it, users need to download the executable files and at least 1 annotation data file.

Release History

Version 1.0

In CanDrA v1.0, it used 95 structural and evolutionary features to build predictive models and can analyze 6 cancer types: Glioblastoma multiforme (GBM), Ovarian Cancer, Breast Cancer, Colorectal Cancer, Malignant Melanoma, Squamous Cell Skin Cancer. Training set was collected according to COSMIC version 58.

Executable Package

Annotation Data Files for Specific Cancer

Genome Biology paper that compared CanDrA and several other Cancer Driver predictors https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0484-1

Version (Plus), obsolete, variant significance weighted by gene level significance

Executable Package

Annotation Data Files for Specific Cancer

Configuration

How to setup CanDrA?

  1. Download and unpack the CanDrA package.
  2. Compile tabix toolkit in ./tabix-0.2.6 under the top directory of CanDrA: “cd ./tabix-0.2.6; make”. If there are any questions about setting up tabix, please refer to the webpage http://samtools.sourceforge.net/tabix.shtml.
  3. Download the corresponding cancer-type specific annotation data file, e.g. OVC.tar.bz2, unpack it “tar -vxjf OVC.tar.bz2”, and place the unpacked directory ./OVC, into the ./database directory.
  4. Run “perl open_candra.pl option input_file > output_file” for testing.

What is the format for an input file?

Columns of an input file are

  1. chromosome number;
  2. genomic_coordinate;
  3. ref_allele;
  4. mutated_allele;
  5. strand.

A input file should be in a tab-delimited format. More please refer to demo_input.txt in the package;

What is the format of an output file?

An output file is in a tab-delimited format; Columns of an output file are

  1. chromosome number;
  2. genomic_coordinate;
  3. ref_allele;
  4. mutated_allele;
  5. strand;
  6. HGNC symbol;
  7. Refseq mRNA;
  8. qualitative mutation consequence;
  9. AAS - Amino Acid Substitution;
  10. AAS Location;
  11. CanDrA score;
  12. CanDrA output category;
  13. Significance of CanDrA score.

How to use CanDrA program?

Version:Plus
Usage:perl open_candra.pl <cancer_type> input_file > output_file
Options:Supported <cancer_type>: BLCA, BRCA, CRC, CSCC, EC, GBM, GENERAL, KIRC, LUAD, LUSMACC, LUSQUCC, MDB, MEL, OVC, PRCA, SCSC
Example:perl open_candra.pl OVC demo_input.txt > demo.annotated

System Requirements

To annotate a missense mutation under a specific cancer background, the package needs around 1 gigabyte storage to store the corresponding annotation database file.

FAQ

Q: What type of reference genome does CanDrA support?
A: For version 1.0, it only supports hg19 (NCBI Build 37).

Q: When I did the query, I saw messages as “[tabix] the index file either does not exist or is older than the vcf file. Please reindex.” in the results. Where is wrong?
A: It means the tabix index file (*.tbi) did not function well on your platform as it is different from our testing platforms. The solution is to run ‘/Path/to/CanDrA*/tabix-0.2.6/tabix -f -s 1 -b 2 -e 2 sorted_CanDrA_*.gz’ in the /Path/to/CanDrA*/database/* directory, where * is a specific cancer type you are using. The purpose is to generate a new sorted_CanDrA_*.gz.tbi for replacing the old one.