Public Datasets

This site is a repository for selected datasets that have been collected and analyzed by investigators at MD Anderson. We have tried to provide a reasonable amount of explanation. Certain tools used to analyze these data are also posted under Software.

Please note that supplementary data sets to published papers are found in the Supplements page.

Standardized TCGA Data

We provide standardized and versioned snapshots of the TCGA Level 3 data in the “Open Access HTTP Directories” found at the TCGA data site. The TCGA data has been put through a “standardization” process and converted to a standard format consisting of a matrix with samples as columns and “gene equivalents” (such as, gene symbols, probe ids, and miRNA ids) as row labels.

Older public datasets

The following data is obsolete. We provide it for historical reasons.