Department of Bioinformatics and Computational Biology

Home > Public Software > Archive > PRADA

This project is archived and no longer maintained.


hidden rowfor table layout
DescriptionPRADA is a pipeline to analyze paired end RNA-Seq data to generate gene expression values (RPKM) and gene-fusion candidates.
Development Information
Current version1.1
PlatformsUnix (OpenPBS)
Last updatedApril 2013
Citation No Formal Publications
Help and Support


Massively parallel sequencing of cDNA reverse transcribed from RNA (RNASeq) provides an accurate estimate of the quantity and composition of mRNAs. To characterize the transcriptome through the analysis of RNA-seq data, we developed PRADA. PRADA focuses on the processing and analysis of gene expression estimates, supervised and unsupervised gene fusion identification, and supervised intragenic deletion identification. The BAM files generated by the pipeline are readily compatible with different tools for mutation calling and to obtain read counts for further downstream analysis.


PRADA currently supports 6 modules to process and identify abnormalities from RNAseq data:

preprocess: Generates aligned and recalibrated BAM files.
fusion: Identifies candidate gene fusions.
guess-ft: Supervised search for fusion transcripts.
guess-if: Supervised search for intragenic rearrangements.
homology: Calculates homology between given two genes.
frame: Predicts functional consequence of fusion transcript


Detail description of installation steps and the usage of each module with examples is available in the documentation .


PRADA is written in Python programing language and intended to run in a command line environment on UNIX or Linux operating systems. To run pyPRADA, download the pre-compiled package and unzip to preferred installation location. Combined genome and transcriptome reference files are available for download:


A sample FASTQ file and resulting BAM file are also available Sample files.

Once the reference files are downloaded and extracted, generate index files for all the FASTA files in reference folder:

[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.formatted.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Homo_sapiens_assembly19.fasta

Set the configuration file (ref.txt):

#reference files
compdb_fasta    [HG19_REF]/
compdb_fai  [HG19_REF]/
compdb_map  [HG19_REF]/
genome_fasta    [HG19_REF]/Homo_sapiens_assembly19.fasta
genome_gtf  [HG19_REF]/Homo_sapiens.GRCh37.64.gtf
dbsnp_vcf   [HG19_REF]/dbsnp_135.b37.vcf
select_tx   [HG19_REF]/Ensembl64.selected.transcripts
feature_file    [HG19_REF]/
tx_seq_file [HG19_REF]/Ensembl64.transcriptome.fasta
ref_anno    [HG19_REF]/Ensembl64.transcriptome.annotations
ref_map [HG19_REF]/
ref_fasta   [HG19_REF]/Ensembl64.transcriptome.formatted.fasta
cds_file    [HG19_REF]/ensembl.hg19.cds.txt
txcat_file  [HG19_REF]/Ensembl64_primary_transcript.txt

#Preprocess step parameters
pbs_queue   long                        #queue name, for preprocessing module
pbs_email       #email used in PBS for notification
parallel_n_threads  24                  #number of cores used in alignment and recalibration