PRADA

Home > Public Software > Archive > PRADA

This project is archived and no longer maintained.

PRADA

hidden row	for table layout
Overview
Description	PRADA is a pipeline to analyze paired end RNA-Seq data to generate gene expression values (RPKM) and gene-fusion candidates.
Development Information
Language	Python
Current version	1.1
Platforms	Unix (OpenPBS)
License	MIT
Status	Archived
Last updated	April 2013
References
Citation	No Formal Publications
Help and Support
Contact	MDACC-Bioinfo-IT-Admin@mdanderson.org

PRADA

Massively parallel sequencing of cDNA reverse transcribed from RNA (RNASeq) provides an accurate estimate of the quantity and composition of mRNAs. To characterize the transcriptome through the analysis of RNA-seq data, we developed PRADA. PRADA focuses on the processing and analysis of gene expression estimates, supervised and unsupervised gene fusion identification, and supervised intragenic deletion identification. The BAM files generated by the pipeline are readily compatible with different tools for mutation calling and to obtain read counts for further downstream analysis.

Modules

PRADA currently supports 6 modules to process and identify abnormalities from RNAseq data:

preprocess	: Generates aligned and recalibrated BAM files.
fusion	: Identifies candidate gene fusions.
guess-ft	: Supervised search for fusion transcripts.
guess-if	: Supervised search for intragenic rearrangements.
homology	: Calculates homology between given two genes.
frame	: Predicts functional consequence of fusion transcript

Documentation

Detail description of installation steps and the usage of each module with examples is available in the documentation .

Installation

PRADA is written in Python programing language and intended to run in a command line environment on UNIX or Linux operating systems. To run pyPRADA, download the pre-compiled package and unzip to preferred installation location. Combined genome and transcriptome reference files are available for download:

HG19

A sample FASTQ file and resulting BAM file are also available Sample files .

Once the reference files are downloaded and extracted, generate index files for all the FASTA files in reference folder:

[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.formatted.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Ensembl64.transcriptome.plus.genome.fasta
[pyPRADA_DIR]/tools/bwa-0.5.7-mh/bwa index -a bwtsw [HG19]/Homo_sapiens_assembly19.fasta

Set the configuration file (ref.txt):

#reference files
compdb_fasta    [HG19_REF]/Ensembl64.transcriptome.plus.genome.fasta
compdb_fai  [HG19_REF]/Ensembl64.transcriptome.plus.genome.fasta.fai
compdb_map  [HG19_REF]/Ensembl64.transcriptome.plus.genome.map
genome_fasta    [HG19_REF]/Homo_sapiens_assembly19.fasta
genome_gtf  [HG19_REF]/Homo_sapiens.GRCh37.64.gtf
dbsnp_vcf   [HG19_REF]/dbsnp_135.b37.vcf
select_tx   [HG19_REF]/Ensembl64.selected.transcripts
feature_file    [HG19_REF]/Ensembl64.canonical.gene.exons.tab.txt
tx_seq_file [HG19_REF]/Ensembl64.transcriptome.fasta
ref_anno    [HG19_REF]/Ensembl64.transcriptome.annotations
ref_map [HG19_REF]/Ensembl64.transcriptome.formatted.map
ref_fasta   [HG19_REF]/Ensembl64.transcriptome.formatted.fasta
cds_file    [HG19_REF]/ensembl.hg19.cds.txt
txcat_file  [HG19_REF]/Ensembl64_primary_transcript.txt

#Preprocess step parameters
pbs_queue   long                        #queue name, for preprocessing module
pbs_email   userid@mdanderson.org       #email used in PBS for notification
parallel_n_threads  24                  #number of cores used in alignment and recalibration

Department of Bioinformatics and Computational Biology

PRADA

Modules

Documentation

Installation