Department of Bioinformatics and Computational Biology

Home > Public Software > Archive > PRADA

This project is archived and no longer maintained.

PRADA

hidden rowfor table layout
Overview
DescriptionPRADA is a pipeline to analyze paired end RNA-Seq data to generate gene expression values (RPKM) and gene-fusion candidates.
Development Information
LanguagePython
Current version1.1
PlatformsUnix (OpenPBS)
LicenseMIT
StatusArchived
Last updatedApril 2013
References
Citation No Formal Publications
Help and Support
Contact MDACC-Bioinfo-IT-Admin@mdanderson.org 

PRADA

Massively parallel sequencing of cDNA reverse transcribed from RNA (RNASeq) provides an accurate estimate of the quantity and composition of mRNAs. To characterize the transcriptome through the analysis of RNA-seq data, we developed PRADA. PRADA focuses on the processing and analysis of gene expression estimates, supervised and unsupervised gene fusion identification, and supervised intragenic deletion identification. The BAM files generated by the pipeline are readily compatible with different tools for mutation calling and to obtain read counts for further downstream analysis.

Modules

PRADA currently supports 6 modules to process and identify abnormalities from RNAseq data:

preprocess: Generates aligned and recalibrated BAM files.
fusion: Identifies candidate gene fusions.
guess-ft: Supervised search for fusion transcripts.
guess-if: Supervised search for intragenic rearrangements.
homology: Calculates homology between given two genes.
frame: Predicts functional consequence of fusion transcript

Documentation

Detail description of installation steps and the usage of each module with examples is available in the documentation .

Installation

PRADA is written in Python programing language and intended to run in a command line environment on UNIX or Linux operating systems. To run pyPRADA, download the pre-compiled package and unzip to preferred installation location. Combined genome and transcriptome reference files are available for download:

HG19

A sample FASTQ file and resulting BAM file are also available Sample files.

Once the reference files are downloaded and extracted, generate index files for all the FASTA files in reference folder:

[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.formatted.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.plus.genome.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Homo_sapiens_assembly19.fasta


Set the configuration file (ref.txt):

#reference files
compdb_fasta    [HG19_REF]/Ensembl64.transcriptome.plus.genome.fasta
compdb_fai  [HG19_REF]/Ensembl64.transcriptome.plus.genome.fasta.fai
compdb_map  [HG19_REF]/Ensembl64.transcriptome.plus.genome.map
genome_fasta    [HG19_REF]/Homo_sapiens_assembly19.fasta
genome_gtf  [HG19_REF]/Homo_sapiens.GRCh37.64.gtf
dbsnp_vcf   [HG19_REF]/dbsnp_135.b37.vcf
select_tx   [HG19_REF]/Ensembl64.selected.transcripts
feature_file    [HG19_REF]/Ensembl64.canonical.gene.exons.tab.txt
tx_seq_file [HG19_REF]/Ensembl64.transcriptome.fasta
ref_anno    [HG19_REF]/Ensembl64.transcriptome.annotations
ref_map [HG19_REF]/Ensembl64.transcriptome.formatted.map
ref_fasta   [HG19_REF]/Ensembl64.transcriptome.formatted.fasta
cds_file    [HG19_REF]/ensembl.hg19.cds.txt
txcat_file  [HG19_REF]/Ensembl64_primary_transcript.txt

#Preprocess step parameters
pbs_queue   long                        #queue name, for preprocessing module
pbs_email   userid@mdanderson.org       #email used in PBS for notification
parallel_n_threads  24                  #number of cores used in alignment and recalibration