Publicly Available Data

This site has been built as a repository for selected datasets collected and analyzed by investigators at MD Anderson. We have tried to provide a reasonable amount of explanation. Certain tools used to analyze these data are also posted under Software. The individual pages for the different datasets are linked to below.


Testing Response to Chemotherapy in Breast Cancer, Pusztai et al 2004
This dataset consists of 620 sample and QC SELDI spectra used in Pusztai et al, "Pharmacoproteomic Analysis of Prechemotherapy and Postchemotherapy Plasma Samples from Patients Receiving Neoadjuvant or Adjuvant Chemotherapy for Breast Carcinoma", Cancer 2004; 100:1814-1822.
Summary of Study: Proteomic changes in NAF plasma were taken before and after paclitaxel or FAC (5-fluorouracil, doxorubicin, and cyclophosphamide) chemotherapy in patients with Stage I - III breast carcinoma to measure response to the chemotherapy. Samples of healthy women were taken also, to help identify breast carcinoma-associated protein markers. Full Abstract

An Example Analysis Using Cromwell, Coombes et al 2005
To show how to use Cromwell, one of our current analysis packages, we've created an example using serum quality control (QC) data derived from the Pusztai et al 2004 dataset. The Cromwell package is decribed in Coombes et al, Proteomics 2005; to appear. An earlier version of this paper is available as a Technical Report (UTMDABTR-001-04).

Quality Control Study for Proteomics of Nipple Aspirate Fluid
The development of the Cromwell package in Coombes et al, (Proteomics 2005; to appear. See also the preliminary Technical Report (UTMDABTR-001-04)) used a set of 24 SELDI spectra that were collected from a pooled (quality control) sample of nipple aspirate fluid from breast cancer patients and healthy controls.

Simulated Proteomics Spectra for Method Development and Comparison, Morris et al.
In our paper on using the mean spectrum for peak finding and quantification, we simulated hundreds of proteomics data sets. We used the simulated data to compare the results of two different processing algorithms. The data sets are available here so other people can compare their algorithms to ours on a standard data set where the truth is known about what peaks are in each spectrum.

Additional Proteomics Resources

For analyzing proteomic data, we currently use

- The Cromwell Package (MATLAB scripts developed here).
- Cromwell uses the Rice Wavelet Toolbox (MATLAB; precompiled binaries exist for many platforms).


Microarray Data
Supplement to: Wang J, Coombes KR, Highsmith WE, Keating MJ, Abruzzo LV. Differences in gene expression between B-cell chronic lymphocytic leukemia and normal B cells: a meta-analysis of three microarray studies. Bioinformatics. Bioinformatics. 2004; 20:3166-78.
MDA133: Clinical Data and dChip MBEI value Files
Supplement to: Hess, et. al, Pharmacogenomic Predictor of Sensitivity to Preoperative Chemotherapy With Paclitaxel and 5-Fluorouracil, Doxorubicin, and Cyclophosphamide in Breast Cancer, Journal of Clinical Oncology, 24 (26), 2006. The latest version of this file include "molecular class" information on a subset of 82 cases; the methods used to derive these classes are described in: this paper.
MDA133: CEL files for Predictor Training and Validation Data Sets
Supplement to: Hess, et. al, Pharmacogenomic Predictor of Sensitivity to Preoperative Chemotherapy With Paclitaxel and 5-Fluorouracil, Doxorubicin, and Cyclophosphamide in Breast Cancer, Journal of Clinical Oncology, 24 (26), 2006.
Normalizer Array and Probe Sensitivity Index File
This zip file has a digital standard Affymetrix U133A v1 array, a dChip Probe Sensitivity Index file, and instructions for using dChip as a common normalizing method for Breast Cancer Samples.
The code for performing diagonal linear discriminant analysis on this data set is also available.
Replicate RNA hybridizations
Supplementary data for: Anderson K, Hess KR, Kapoor M, Tirrell S, Courtemanche J, Wang B, Wu Y, Gong Y, Hortobagyi GN, Symmans WF, Pusztai L. Reproducibility of gene expression signature-based predictions in replicate experiments. Clin Cancer Res 2006;12:1721-7.
CEL files for MDACC-FNA-CBX-74
This zip file contains CEL files and sample matching information for: Bianchini, G., Qi, Y., Alvarez, R.H., Iwamoto, T., Coutant, C., Ibrahim, N.K., Valero, V., Cristofanilli, M., Green, M.C., Radvanyi, L., Hatzis, C., Hortobagyi, G.N., Andre, F., Gianni, L., Symmans, W.F. and Pusztai, L. Molecular Anatomy of Breast Cancer Stroma and Its Prognostic Value in Estrogen Receptor-Positive and -Negative Cancers, Journal of Clinical Oncology, Published online before print August 30, 2010.
CEL files for 19 breast cancer cell lines
Description pending.

©2003-2010 The University of Texas MD Anderson Cancer Center
1515 Holcombe Blvd, Houston, TX 77030
1-800-392-1611 (USA) / 1-713-792-6161      Legal Statements