From MD Anderson Bioinformatics
Bioinformatics Software and Services
Here is a list of all software and services provided by the Department of Bioinformatics and Computational Biology. They are copyrighted by the University of Texas MD Anderson Cancer Center and by the individual employees of the cancer center who helped develop them. They are freely available for personal use in research projects; however, anyone wishing to use them or modify them for use in a commercial project should contact the Office of Technology Commercialization.
A to G
- A NGS genomic loci mapping refiner, which improves the mapping of the multireads (reads mapped to more than one genomic location with similar fidelities), as a refinement step after the general read-alignment is completed.
- A tool for performing model-based inference for differential gene expression, using a non-parametric Bayesian mixture probability model for the distribution of gene intensities under different conditions.
- Cromwell is a set of MATLAB scripts for low-level processing of mass spectrometry proteomics data. Cromwell relies on the undecimated discrete wavelet transform for denoising spectra.
- eFISMIC is a comprehensive database of experimental-evidence-based functional impact of somatic mutations in cancer.
- A tool for exploratory analysis of gene expression microarray data.
- geneSmash is a mash-up of various sources of information about human genes including the NCBI Entrez gene FTP site, UCSC Genome Browser, miRBase and human gene expression array annotation extracted from Manufacturers' websites. Use geneSmash
H to R
- An interactive tool for mapping microarray gene expression data onto the GeneOntology directed acyclic graph.
- Zoomable (clustered) heat maps with links to statistical information, databases, and other related analyses.
- OOMPA is an object-oriented microarray and proteomics analysis library implemented in R using S4 classes and compatible with BioConductor.
- A tool to quantify Affymetric microarrays. Quantifications are based on the "Position Dependent Nearest Neighbor" model, which makes explicit use of probe sequence information.
- The Report and Analysis Template Builder (RATb) uses XML and R to create reusable, structured Sweave files for the statistical and bioinformatic analysis of high-throughput 'omics' datasets. Use RAT Builder
- This website provides a tool to perform RefSeq match for pairs of affy probe set and cDNA sequences that correspond to the same UniGene cluster by using blast search program 'bl2seq'. Use Refseq Verifier
- Rocket is a set of perl scripts and modules used to confirm that the clones sequenced in our Core Laboratory match the annotations provided by the supplier. (More precisely, Rocket provides a graphical interface using perl/Tk to a command line perl script.) Rocket will require some customization before you can use it. It assumes the existence of a local database containing the supplier's clone annotations; you'll have to create such a database separately and make sure the names of the fields in the database match those used by Rocket.
S to Z
- A tool to compute the number of samples needed to detect expression changes for a microarray experiment. Use Sample Sizes
- This service provides a tool to predict whether sequences represent real genes with functional products or possibly contamination/transcriptional noise given either a sequence, or a list of sequences in FASTA format. Use Sequence Quality Check
- SpliceSeq provides a quick, easy method of investigating alternative mRNA splicing in next generation mRNA sequence data. The tool may be used on a single mRNA-Seq sample to identify genes with multiple spliceforms or on a pair of samples to identify differential splicing between the samples. Sequence reads are mapped to splice graphs that unambiguously quantify the inclusion level of each exon and splice junction. The graphs are then traversed to predict the protein isoforms that are likely to result from the observed exon and splice junction reads. UniProt annotations are mapped to each protein isoform to identify potential functional impacts of alternative splicing.
- SuperCurve is a stand-alone package, bundled with OOMPA, that provides tools for the analysis of reverse phase protein arrays.
- This service provides a bioinformatic web app for identifying network-based biomarkers that most correlate with patient survival data. Use SurvNet
- Tissue microarrays are increasingly important tools that bring high-throughput technology to traditional pathology laboratories. Inmany cases, each spot on a tissue microarray is scored by a skilled pathologist and recorded manually. TAD consists of an Active ServerPage web interface to a relational SQL database that automatesrecording scores and linking them with clinical data for future interpretation.
- TCPA provides a comprehensive resource for accessing, visualizing, and analyzing cancer proteomics.
- The Cancer Genomic Atlas (TCGA) project studies different types of cancer by obtaining datasets from different tissue source sites, using different sequencing centers and technologies. Data sets can sometimes be biased depending on the batch from which it came. The TCGA Batch Effects Tool provides pre-computed graphical annotations of different TCGA data sets that allows users to screen for batch biases in the dataset. For information on the first version of the tool please click here.
- Code to obtain MCMC samples for wavelet-based functional mixed model method in Morris and Carroll (2006). Obtains posterior samples of model parameters in functional mixed model. Original SiteDownload Package
Mirrored External Databases
In this section, you will find mirrors of externally developed services hosted at MD Anderson.
- GeneCards® is a database of human genes, their products and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others. GeneCards was developed at the Crown Human Genome Center and the Weizmann Institute of Science. It is a compilation of material gathered from disparate publicaly available databases and placed in a coherent organizational framework. GeneCards at MD Anderson is a mirror of the primary site, hosted by the Bioinformatics Section to facilitate faster access to this material by MD Anderson researchers. Because the actual location of the search page may change depending on the demand for various services, please bookmark the current page as your starting point for GeneCards searches.