This is meant to be a simple example illustrating the use
of the Cromwell package (Coombes et al, Proteomics 2005)
for analyzing SELDI/MALDI proteomic spectra. 

***** WARNING. THIS EXAMPLE IS NOT SELF-CONTAINED. *****

To use this 
example, you will need (in addition to the files supplied
here) a working version of MATLAB (we have tested this only
on version 6.5 and higher) and the Rice Wavelet Toolbox,
available from www-dsp.rice.edu/software/rwt.shtml
This should be installed in the bin folder produced
when the RawData file is unzipped. We have included a
copy of Cromwell there already. 

*** END WARNING. THIS EXAMPLE IS NOT SELF-CONTAINED. ***


The spectra used here are a subset of those used in Pusztai
et al (2004), Cancer 100:1814-22. The full dataset is available
from http://bioinformatics.mdanderson.org. The spectra here are
20 low mass scans of QC samples derived from a common serum pool;
all of the spectra should be telling the same story here. 
All of the spectra have intensities at 33885 m/z values,
and the vector of m/z values is the same for all spectra.
The machine was calibrated to a set of known peaks shortly
before the entire set of spectra was run.  
The spectra were run in randomized order on a series of
chips. 

The files have been provided in 2 formats.

RawBinary contains "Low_mass_serum_QC.xpt" which is the 
binary format used by the Ciphergen software. This file 
contains all of the spectra used here. 

RawXML contains all of the spectra exported from the above
xpt file using the Ciphergen software (version 3.1.1). This 
format contains the spectra intensities both before any 
processing (the integer counts in tofDataSamples) and after 
application of various correction factors (the m/z, intensity 
pairs in processedDataSamples). The XML files also contain all 
of the setting parameters used in processing the data, including 
run times of the spectra (which in general can be used to 
confirm the randomness or lack thereof of the run order with
respect to sample group). 

Due to historical development, the scripts in Cromwell were
written to deal with files in two-column .csv format, with 
the first column corresponding to M/Z and the second to 
intensity. We extract these files from the XML files using
the kludged script

xml2txt.pl

(this takes about 5-10 sec on my laptop.)
Note that this script does not simply take the last half of
the XML datafile, consisting of the M/Z,Intensity pairs 
returned by the Ciphergen software. Rather, it takes these
M/Z values but draws the Intensity values from the raw
integer counts supplied in tofDataSamples, thus getting the
data before any preprocessing has been applied. 
Running the above script will place .txt versions of the 
spectra in the folder RawSpectra/.

Finally, we shift to the processing of the raw spectra to 
produce baseline corrected and smoothed spectra together
with matrices of peak intensities. This procedure is detailed
in
  processSpectra.m (our m-file)
Most of the processing is described in far more detail in 
the m-file named above, and one or two illustrative pictures 
will be stored in Figs/. Matlab binary .mat files 
(CorrectedSpectra.mat and Peaks.mat) will be produced for 
later analysis.

Hope that helps!