The spectra here are described in Pusztai et al. (2004), Cancer 100:1814-22. If you make use of the spectra please reference the paper. The spectra are grouped into 4 blocks by date of acquisition and laser intensity used: Low Mass March03 High Mass March03 Low Mass June03 High Mass June03 and are further subdivided in the sections below, as follows: March 03 Low Mass Nor Pre Post Flagged Plasma QC Serum QC High Mass Nor Pre Post Flagged Plasma QC Serum QC June 03 Low Mass Pre Post Plasma QC Serum QC High Mass Pre Post Plasma QC Serum QC All spectra were exported as Ciphergen XML files, which include most of the machine settings used, timestamps of the runs, the summed counts at the detector, and the m/z,Intensity pairs as supplied by the Ciphergen software after exporting (the default export behavior is not to export these additional values, but we have elected to include them). These were exported using version 3.1.1 of the Ciphergen software, and can be imported into Ciphergen software for further analysis if desired. March 03 Low Mass Nor Pre Post The data here are low-mass range SELDI spectra derived from patients with breast cancer and from normal controls. All told, there are 164 such spectra, resulting in 148 that were usable (described below). For each of 15 normal controls, we have measurements made on blood taken at an initial visit, and further matched measurements made on blood drawn a week later. The sample pairing is described below. All samples were run in duplicate, so there were 60 normal spectra corresponding to 4 per patient. We actually have with 57 spectra here as 3 of the spectra yielded poor signal and were "flagged" in the initial processing. Flagged spectra are included in a separate folder. Likewise, we have spectra from 26 cancer patients, with the first measurements being made on blood drawn right before the initiation of Taxol chemotherapy, and the second measurements being made on blood drawn 3 days after initiation of chemo. All of the samples were again run in duplicate, yielding 104 spectra. Of these, 8 were not used in our initial analysis as 2 of the patients were determined post facto to have been ineligible for the study due to prior treatment, and 5 were excluded (flagged) due to poor spectra quality, leaving 91 cancer spectra. The patient samples that should be "ineligible" are discussed below; the spectra are still in this folder. The spectra were run in randomized order on a series of chips. In addition, several quality control spectra were run, with the QC material derived from both serum and plasma sources. The plasma control was a pool derived from 3 cancer patients from the center that supplied the other cancer samples; the serum control was a pool used at Eastern Virginia Medical School (EVMS), which is where the samples were run. In total, there were 168 patient sample spectra, 8 plasma QC spectra, and 20 serum QC spectra -- 192 spectra, run on 24 chips. All of the spectra have intensities at 33885 m/z values, and the reported vector of m/z values is the same for all spectra. The machine was calibrated to a set of known peaks shortly before the entire set of spectra was run. Samples have one of 3 prefixes: NO -- normal, healthy control PR -- cancer, pre-chemo PO -- cancer, post-chemo Each sample has a corresponding sample bank code as well, so a typical filename is NO216. NO216 and NO216(2) refer to the two replicates of the same sample. The pairing of sample codes (which early normal matches which later normal, and which pre-chemo sample matches which post-chemo) is supplied in the first columns of two Excel files in ClinicalInfo: Coded_clinical_data_march03.xls and Normal_Samples_march03.xls. The first entry in a pair being the earlier measurement. Thus, the first entry in Normal_Samples_march03.xls, 207, 226 indicates that NO207 and NO226 are from the same patient with NO207 drawn a week earlier. Similarly, the first entry in Coded_clinical_data_march03.xls, 7, 10 indicates that PR7 matches PO10. The row entries corresponding to the ineligible patients, (42, 44) and (98, 99), have been marked in red. In this phase of the study, the focus was intended to be on patients receiving chemo before surgery, and these two patients received chemo after surgery. The pre/post surgery issue is revisited in the June 03 data discussed below. The initial clinical goal was to see if markers could be found to identify just the patients that exhibited a strong clinical response to chemo (judged at the end of the course of treatment) based on their response after just a few days. Patients exhibiting a favorable response are indicated in yellow in the last two columns. March 03 Low Mass Flagged Spectra This folder contains the 8 sample spectra judged to be of poorer quality by the machine operator. March 03 Low Mass Plasma QC This folder contains the 8 spectra produced from the common pool derived from 3 cancer patient samples. These samples are all of the same material, and should provide a reference control. March 03 Low Mass Serum QC This folder contains the 20 spectra produced from the common pool used at EVMS. These samples are all of the same material, and should provide a reference control. March 03 High Mass Nor Pre Post This contains spectra from the same samples as in the Low mass equivalent, save that only 5 spectra here were flagged as being of poor quality as opposed to 8. The same chips were used, just with a higher laser intensity setting. All of the file names, mappings, and clinical information are the same. March 03 High Mass Flagged Spectra This folder contains the 5 sample spectra judged to be of poorer quality by the machine operator. March 03 High Mass Plasma QC This folder contains the 8 spectra produced from the common pool derived from 3 cancer patient samples. These samples are all of the same material, and should provide a reference control. March 03 High Mass Serum QC This folder contains the 20 spectra produced from the common pool used at EVMS. These samples are all of the same material, and should provide a reference control. June 03 Low Mass Pre Post This data set is a follow-on to the data analyzed in March of 2003. In that experiment, we looked for differences in the protein profiles of breast cancer patients before and after the initiation of chemotherapy. Samples were acquired in a paired fashion, so that a blood draw was performed immediately before administration of chemo, and a second was performed 3 days after the initiation of chemo; the samples are thus paired. To account for the possibility that we might be seeing differences associated with time variability, another set of samples from normal controls was run at the same time. Samples from the controls were drawn once, and then a second time a week after the first. We were looking for differences in the protein profiles that would separate a) cancer patients from normal controls b) pre-treatment from post-treatment cancer patients c) complete responders from those unlikely to respond to chemo A small number of peaks were found to achieve good separation for split a), and one prominent peak was found to give good separation for split b). We did not have much success with c). What's new this time: 1) the patients here are all cancer patients, and most of them are patients who are receiving _post-operative_ chemo (37 patients are post-op here, as opposed to 6 receiving pre-op FAC.) In the earlier experiment, chemo was administered pre-op, save for the two patients included by accident. There are no normal controls this time, and no spectra were initially flagged as being of poor quality. 2) there are two different chemo agents being used. In the earlier experiment, all of the patients received Taxol. In this experiment, many of them received FAC instead (the paper gives a slightly more extensive description of the distinction). 3) we do not have paired samples from all of the patients involved, just most of them. As before, the sample matchings are given in Coded_clinical_data_june03.xls (note that this file has 2 sheets, corresponding to the pre- and post-op chemo patients). The questions of interest (statistician translation) are: Looking at the post-op chemo patients: 1) Is the baseline "Pre" plasma profile more similar to the Normal or Cancer profile from the earlier experiments? 2) Do we see the same chemo-induced peaks in the taxol-treated group as we did when the drug was not used pre-op? 3) Does FAC chemo induce the same changes as Taxol chemo in the "Pre vs Post" samples (note -- It would be nice to include the two cases that were excluded from the first analysis because they were post-op chemo in some way here, but there will be normalization issues.) Looking at the pre-op chemo patients: 1) Does pre-op FAC induce the same changes in the plasma that pre-op Taxol did? 2) Pre-op samples should look like "cancer cases" from the first set. Do they? The questions of interest (as initially posed): this is what we would like to test on the new set of SELDI data. What we have seen so far in the first set of data is that 1. Patients with intact breast cancer have some extra peaks compared to healthy women. 2. Pre-operative Taxol chemo induces a few new peaks in the plasma on day 3 compared to pre treatment. This new data includes 6 patients who received preoperative FAC chemo. Using these samples we could test 2 questions: 1. Does preoperative FAC induce the same peaks as preoperative Taxol? 2. We expect baseline "pretreatment" profiles of these 6 cases (actually only 5 one pretreatment is missing) to look like pretreatment cancer plasma from the first set of cases. Are they similar? This new set also includes 37 cases who had postoperative chemo; the cancer was already removed surgically. 23 had Taxol and 14 had FAC. We could ask the following questions: 1. Does baseline pretreatment plasma of these postoperative cases look like cancer or looks normal from the first set? In this analysis please include samples 42 and 98 from previous set if possible. 2. Does postoperative Taxol induce the same peaks as preoperative Taxol did in the first set? (if yes, the peak must come from non-cancer tissue). In this analysis please include cases 42/44 and 98/99 from the first set. 3. Does postoperative Taxol induce the same changes as postoperative FAC? What the data consists of: As before, we have pre- and post- spectra, together with replicated serum QC and plasma QC samples. All samples were run using both a high and a low mass scan setting. No spectra were flagged by the operator as "bad" this time. All of the patient spectra were run in duplicate. We have spectra from 37 post-op patients and 6 pre-op patients. We only have post-chemo blood samples from about 2/3 of the patients, so the pairing is imperfect. All in all, there are 72 patient samples and 144 patient spectra. June 03 Low Mass Plasma QC This folder contains the spectra from 21 plasma QC samples. June 03 Low Mass Serum QC This folder contains the spectra from 3 plasma QC samples. All in all, we have data from 21 chips here, for a total of 168 spectra. June 03 High Mass Pre Post These are the high mass scan spectra for the same samples as described above. The same chips were used, just rescanned. *********** WARNING!!! ************** It should be noted that almost all of the high mass pre, post samples have incorrect m/z values. Specifically: High-mass pre,post spectra have the wrong calibration; THE FACTORY DEFAULT IS USED. THIS CANNOT BE RIGHT. However, on looking at the calibration parameter values, we note that one spectrum, PO11, looks different from the others. Checking, we find that the calibration for this single spectrum has been fit using the same equation used for the high-mass serum QC samples. Our guess is that the change was applied after an accidental click had disabled the "select all spectra" option (selecting just the one), so that everyone thought that the spectra had been calibrated. As the correctly calibrated m/z values are the same as those shown in the high mass QC samples, the m/z vector from PO11 should be used. We have included this here as an object warning of sorts. *********** END WARNING!!! ************** Aside from the calibration issue, we again have 144 spectra as described above. June 03 High Mass Plasma QC This folder contains the spectra from 21 plasma QC samples. June 03 High Mass Serum QC This folder contains the spectra from 3 plasma QC samples. All in all, we have data from 21 chips here, for a total of 168 spectra.