This project is archived and no longer maintained.
PRADA
Overview | |
Description | PRADA is a pipeline to analyze paired end RNA-Seq data to generate gene expression values (RPKM) and gene-fusion candidates. |
Development Information | |
Language | Python |
Current version | 1.1 |
Platforms | Unix (OpenPBS) |
License | MIT |
Status | Archived |
Last updated | April 2013 |
References | |
Citation | No Formal Publications |
Help and Support | |
Contact | MDACC-Bioinfo-IT-Admin@mdanderson.org |
Massively parallel sequencing of cDNA reverse transcribed from RNA (RNASeq) provides an accurate estimate of the quantity and composition of mRNAs. To characterize the transcriptome through the analysis of RNA-seq data, we developed PRADA. PRADA focuses on the processing and analysis of gene expression estimates, supervised and unsupervised gene fusion identification, and supervised intragenic deletion identification. The BAM files generated by the pipeline are readily compatible with different tools for mutation calling and to obtain read counts for further downstream analysis.
PRADA currently supports 6 modules to process and identify abnormalities from RNAseq data:
preprocess | : Generates aligned and recalibrated BAM files. |
fusion | : Identifies candidate gene fusions. |
guess-ft | : Supervised search for fusion transcripts. |
guess-if | : Supervised search for intragenic rearrangements. |
homology | : Calculates homology between given two genes. |
frame | : Predicts functional consequence of fusion transcript |
Detail description of installation steps and the usage of each module with examples is available in the documentation .
PRADA is written in Python programing language and intended to run in a command line environment on UNIX or Linux operating systems. To run pyPRADA, download the pre-compiled package and unzip to preferred installation location. Combined genome and transcriptome reference files are available for download:
HG19
A sample FASTQ file and resulting BAM file are also available Sample files .
Once the reference files are downloaded and extracted, generate index files for all the FASTA files in reference folder:
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.formatted.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.plus.genome.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Homo_sapiens_assembly19.fasta
Set the configuration file (ref.txt):
#reference files
compdb_fasta [HG19_REF]/Ensembl64.transcriptome.plus.genome.fasta
compdb_fai [HG19_REF]/Ensembl64.transcriptome.plus.genome.fasta.fai
compdb_map [HG19_REF]/Ensembl64.transcriptome.plus.genome.map
genome_fasta [HG19_REF]/Homo_sapiens_assembly19.fasta
genome_gtf [HG19_REF]/Homo_sapiens.GRCh37.64.gtf
dbsnp_vcf [HG19_REF]/dbsnp_135.b37.vcf
select_tx [HG19_REF]/Ensembl64.selected.transcripts
feature_file [HG19_REF]/Ensembl64.canonical.gene.exons.tab.txt
tx_seq_file [HG19_REF]/Ensembl64.transcriptome.fasta
ref_anno [HG19_REF]/Ensembl64.transcriptome.annotations
ref_map [HG19_REF]/Ensembl64.transcriptome.formatted.map
ref_fasta [HG19_REF]/Ensembl64.transcriptome.formatted.fasta
cds_file [HG19_REF]/ensembl.hg19.cds.txt
txcat_file [HG19_REF]/Ensembl64_primary_transcript.txt
#Preprocess step parameters
pbs_queue long #queue name, for preprocessing module
pbs_email userid@mdanderson.org #email used in PBS for notification
parallel_n_threads 24 #number of cores used in alignment and recalibration