Modifications to Irreproducibility of Chemopredictors

Detailed Rebuttal: We posted a more detailed response to the Potti-Nevins reply (KAB, KRC; 13 November 2007).
Initial Rebuttal: We posted an initial response to the Potti-Nevins reply (KRC; 8 November 2007).
Correspondence Published: Our correspondence appeared, along with a reply from Potti and Nevins (KRC; 7 November 2007).
Checking Revised Gene Lists: In response to previous communications, Potti and Nevins posted revised lists of genes, which were still incorrect (KAB; 25 August 2007).

Duke Web Page Removed

Posted 16 November 2009 (KRC)

As pointed out by a correspondent, the Duke web site supporting their Nature Medicine paper on chemoresponse has disappeared; trying to connect to http://data.cgt.duke.edu/NatureMedicine.php yields a "403 Forbidden" response. The same correspondent pionted out that you can still obntain old vesions at one of the following URLs:

Detailed Rebuttal

Posted 13 November 2007 (KAB, KRC)

As promised in our post of 7 November 2007, we have now examined some of the new data available at the Duke web site for the Nature Medicine paper by Potti et al. We have prepared an overview of our new findings, which serves as an introduction to additional reports prepared in Sweave. Here is a brief summary.

We do not reproduce their results using their data and their cell lines; we only get 17/24 correct on the Chang breast cancer test set, not the 22/24 that they report.
The discrepancy is partially explained by errors made by Potti et al. when labeling the cell lines as sensitive or resistant. Specifically:
1. The sensitive/resistant labels supplied for the training data are the reverse of those Potti et al supply in their "Description of the Process of Predictor generation.doc", so their predictions will be reversed.
2. The sensitive/resistant labels are scrambled for the test data, with 10 of the 24 samples mislabeled before any processing is applied.
3. One of the Chang test samples, GSM4914, is omitted.
4. One of the Chang test samples, GSM4910, is included twice, labeled once as resistant and once as sensitive.
The labeling of samples for the Adriamycin test set are also mixed up. Specifically:
1. The 122 validation samples are not all distinct; only 84 of the data columns are unique.
2. Some samples are used 2, 3, or 4 times.
3. Further, some samples used multiple times are labeled both ways -- one of the samples present 4 times is labeled NR 3 times, and Resp once.

Initial Rebuttal

Posted 8 November 2007 (KRC)

The volume of Nature Medicine where our correspondence was published contains a response from Potti and Nevins. Not surprisingly, we disagree with several of their assertions. Here we provide a point-by-point rebuttal.

Potti and Nevins say "We have provided details describing [cell line selection] on our web page (http://data.cgt.duke.edu/NatureMedicine.php)."
We appreciate the extra details that they have now made available. This description, however, was not part of the original article (or supplements) published by Nature Medicine. It is also not consistent with the statement in the published methods that indicated that they used GI50 values as the primary measure of sensitivity or resistance. As we have stated, the sensitive and resistant cell lines used in the original publication have GI50 values that overlap ( Supplementary Report 3), which still calls into question their classification as the most sensitive and resistant cohorts. Further, the taxtotere example that they provide in their new web page includes a footnote pointing out that they arbitrarily excluded the cell line COLO 205, which was by far the most sensitive to taxotere. This fact could not have been ascertained from the original paper.
Potti and Nevins say "Because Coombes et al. did not follow these methods precisely ..., they have made assumptions inconsistent with our procedures."
This statment is misleading. As our supplementary reports show, we tried to reproduce all of the rest of the steps in their analysis using both the cell lines we thought were most sensitive and resistant as well as the cell lines they used in their reported analysis.
In their response, Potti and Nevins have acknowledged the off-by-one error and stated that "[a]dditional inaccuracies resulted from errors made when we assembled the gene lists."
We simply note here that their response does not explicitly address point (4) of our published critique: 14 of the genes that were not produced by their own software for the taxotere signature were taken directly from the paper by Chang et al. that described the breast cancer test set used by Potti and Nevins.
Potti and Nevins do go on to say "Additionally, there was no accidental inclusion of genes from the validation data distinguishing responders from non-responders and this is not an explanation for the generation of 'better than chance' predictions."
We stand by our analysis: using their own software on the combined data consisting of a training set from the cell lines they say they used and a test set from the Chang breast cancer data, the resulting model does not predict outcome any better than chance. Our code is still available for inspection; they have only provided unsupported assertions to the contrary.
Potti and Nevins say "these errors in no way influence the primary results of our study, as the models are defined by the training set, not by gene lists."
This sentence skirts two important issues. First, as we point out on page 15 of Supplementary Report 9, the other genes that mysteriously appeared on their list were the only genes that were mentioned by name in their publication as part of the biological justification for believing the signatures were meaningful. Second, as we discuss in our letter (and continue below), their model uses data from the test set as well as from the training set.
Potti and Nevins claim that their method of building metagenes using SVD on the combined training and test sets is "entirely appropriate".
We, obviously, disagree, at a very fundamental level. Potti and Nevins implicitly acknowledge that the model you get when you combine training data and test data is different from the model you would get using only the training data. In particular, this approach means that you cannot fully specify the model before the test set is available. Further, it suggests that the model morphs into a different model for every new test set. These observations imply, to us, that the independence of the training and test sets is not maintained. Moreover, they imply that they do not actually have a single predictive model that can be generalized to new patient samples.
Potti and Nevins claim that Figure 8 in our Supplementary Report 9 shows that we have reproduced their results.
That figure shows two things. First, if you build the model (with their software) using only their training data, then you cannot make useful predictions on the test data. Second, when you combine the test data and the training data to build the model, then you can at least get predictions of two different classes in the test data. Note, however, that the report does not address whether those predictions are correct. We are preparing an additional report (which will be posted here soon) showing that the predictive accuracy is no better than chance. (Interested readers are encouraged to reserve judgement until that time, or to run our existing code and then test the accuracy of the predictions themselves, since otherwise we have simply made the same sort of unsupported assertions that we have complained about in our letter.)
Potti and Nevins close with "Finally, we also note that we have applied our methods, as well as several of the original signatures, to predict patient response in additional datasets, some blinded to us, yielding accuracies consistent with our initial results".
We have our own opinions on the reproducibility and correctness of the results they cite in the publication by Hsu et al. in the Journal of Clinical Oncology, which we will be sure to discuss in an appropriate venue....

Correspondence Published

Posted 7 November 2007 (KRC)

Our letter to Nature Medicine was published on 7 November 2007. The full reference (with the new title given to it by the editors is Coombes, KR, Wang J, Baggerly KA. Microarrays: retracing steps. Nature Medicine, 2007; 13(11):1276-1277.

Checking Revised Gene Lists

Posted 25 Aug 2007 (KAB)

After we communicated our original analysis to Drs. Potti and Nevins, they posted revised gene lists to the Nature Medicine web site in an attempt to correct the off-by-one indexing error. They apparently attempted to fix the problem manually, which introduced a number of different errors into the lists. Our review of the new gene lists is posted here: