This is meant to be a simple example illustrating the use of the Cromwell package (Coombes et al, Proteomics 2005) for analyzing SELDI/MALDI proteomic spectra. ***** WARNING. THIS EXAMPLE IS NOT SELF-CONTAINED. ***** To use this example, you will need (in addition to the files supplied here) a working version of MATLAB (we have tested this only on version 6.5 and higher) and the Rice Wavelet Toolbox, available from www-dsp.rice.edu/software/rwt.shtml This should be installed in the bin folder produced when the RawData file is unzipped. We have included a copy of Cromwell there already. *** END WARNING. THIS EXAMPLE IS NOT SELF-CONTAINED. *** The spectra used here are a subset of those used in Pusztai et al (2004), Cancer 100:1814-22. The full dataset is available from http://bioinformatics.mdanderson.org. The spectra here are 20 low mass scans of QC samples derived from a common serum pool; all of the spectra should be telling the same story here. All of the spectra have intensities at 33885 m/z values, and the vector of m/z values is the same for all spectra. The machine was calibrated to a set of known peaks shortly before the entire set of spectra was run. The spectra were run in randomized order on a series of chips. The files have been provided in 2 formats. RawBinary contains "Low_mass_serum_QC.xpt" which is the binary format used by the Ciphergen software. This file contains all of the spectra used here. RawXML contains all of the spectra exported from the above xpt file using the Ciphergen software (version 3.1.1). This format contains the spectra intensities both before any processing (the integer counts in tofDataSamples) and after application of various correction factors (the m/z, intensity pairs in processedDataSamples). The XML files also contain all of the setting parameters used in processing the data, including run times of the spectra (which in general can be used to confirm the randomness or lack thereof of the run order with respect to sample group). Due to historical development, the scripts in Cromwell were written to deal with files in two-column .csv format, with the first column corresponding to M/Z and the second to intensity. We extract these files from the XML files using the kludged script xml2txt.pl (this takes about 5-10 sec on my laptop.) Note that this script does not simply take the last half of the XML datafile, consisting of the M/Z,Intensity pairs returned by the Ciphergen software. Rather, it takes these M/Z values but draws the Intensity values from the raw integer counts supplied in tofDataSamples, thus getting the data before any preprocessing has been applied. Running the above script will place .txt versions of the spectra in the folder RawSpectra/. Finally, we shift to the processing of the raw spectra to produce baseline corrected and smoothed spectra together with matrices of peak intensities. This procedure is detailed in processSpectra.m (our m-file) Most of the processing is described in far more detail in the m-file named above, and one or two illustrative pictures will be stored in Figs/. Matlab binary .mat files (CorrectedSpectra.mat and Peaks.mat) will be produced for later analysis. Hope that helps!