Filtering Samples from the Tothill Data to Focus on RD

by Keith A. Baggerly

1 Executive Summary

1.1 Introduction

Tothill et al. profiled 285 ovarian tumor samples, but not all of the patients had the same type of disease, or had residual disease (RD) information recorded. We want to identify the high-grade serous ovarian tumors with RD information to focus the question more precisely.

1.2 Methods

Starting with the previously assembled table of clinical information, we examine the various columns and see which clinical features would justify exclusion from the set being examined.

We consider

We use these rules to build up a data frame with two columns: sampleUse (Used or Unused), and whyExcluded.

1.3 Results

We exclude 96 of the 285 samples for various reasons. Of the 189 that remain, 139 are RD and 50 are No RD.

We save tothillFilteredSamples to the RData file “tothillFilteredSamples.RData”.

2 Libraries

We first load the libraries we will use in this report.

3 Loading the Data

Here we simply load the previously assembled clinical information.


load(file.path("RDataObjects", "tothillClinical.RData"))
tothillClinical[1:3, ]
##         GEO.ID SampleID KMeansGroup ClinicalType HistologicSubtype
## X49  GSM249839       49           5          MAL               Ser
## X129 GSM250001      129           1          MAL               Ser
## X146 GSM250000      146          NC          MAL               Ser
##      PrimarySite Stage Grade Age Status Pltx Tax Neo MosToRelapse
## X49           OV   III     3  56      D    Y   N   N            7
## X129          OV   III     3  65      D    Y   N   N            7
## X146          OV   III     3  56     PF    Y   N   N          166
##      MosToDeath ResidDisease ArraySite
## X49           8           <1        OV
## X129         15           >1        PE
## X146        166           >1        OV

4 Filtering Samples Used

We now walk through the various criteria, and seeing what these imply for inclusion of the various samples. Our default assumption is that all samples are used.


sampleUse <- rep("Used", nrow(tothillClinical))
names(sampleUse) <- rownames(tothillClinical)

whyExcluded <- rep("", nrow(tothillClinical))
names(whyExcluded) <- rownames(tothillClinical)

4.1 Residual Disease

First, we check residual disease status, and exclude patients with no information.


table(tothillClinical[, "ResidDisease"])
## 
##            <1            >1 macro size NK           nil            NK 
##            76            70            18            84            37

sampleUse[tothillClinical[, "ResidDisease"] == "NK"] <- "Unused"
whyExcluded[tothillClinical[, "ResidDisease"] == "NK"] <- paste(whyExcluded[tothillClinical[, 
    "ResidDisease"] == "NK"], "-No RD Info-", sep = "")

table(sampleUse)
## sampleUse
## Unused   Used 
##     37    248

4.2 Clinical Type

Next, we look at clinical type. Some of the samples are known to be of low malignant potential (LMP), and we don't want to use them.


table(tothillClinical[, "ClinicalType"])
## 
## LMP MAL 
##  18 267

sampleUse[tothillClinical[, "ClinicalType"] == "LMP"] <- "Unused"
whyExcluded[tothillClinical[, "ClinicalType"] == "LMP"] <- paste(whyExcluded[tothillClinical[, 
    "ClinicalType"] == "LMP"], "-LMP-", sep = "")

table(sampleUse)
## sampleUse
## Unused   Used 
##     53    232

4.3 Histologic Subtype

Next, we look at histologic subtype. We only want to keep serous (Ser) tumor samples.


table(tothillClinical[, "HistologicSubtype"])
## 
## Adeno  Endo   Ser 
##     1    20   264

sampleUse[tothillClinical[, "HistologicSubtype"] == "Adeno"] <- "Unused"
sampleUse[tothillClinical[, "HistologicSubtype"] == "Endo"] <- "Unused"

whyExcluded[tothillClinical[, "HistologicSubtype"] == "Adeno"] <- paste(whyExcluded[tothillClinical[, 
    "HistologicSubtype"] == "Adeno"], "-Adeno Subtype-", sep = "")
whyExcluded[tothillClinical[, "HistologicSubtype"] == "Endo"] <- paste(whyExcluded[tothillClinical[, 
    "HistologicSubtype"] == "Endo"], "-Endo Subtype-", sep = "")

table(sampleUse)
## sampleUse
## Unused   Used 
##     68    217

4.4 Array Site

Next, we look at the site the sample was taken from (the “array site”“). We want tumors from the ovary or the peritoneum.


table(tothillClinical[, "ArraySite"])
## 
##       BN       CO       FT       OM    Other       OV OV or OM       PE 
##        1        4        2        2        3      200        1       71 
##       UT 
##        1

sampleUse[!is.element(tothillClinical[, "ArraySite"], c("OV", "PE"))] <- "Unused"
whyExcluded[!is.element(tothillClinical[, "ArraySite"], c("OV", "PE"))] <- paste(whyExcluded[!is.element(tothillClinical[, 
    "ArraySite"], c("OV", "PE"))], "-Not OV or PE-", sep = "")

table(sampleUse)
## sampleUse
## Unused   Used 
##     77    208

4.5 Neoadjuvant Chemo

Next, we look at whether the patients received neoadjuvant chemotherapy. We want to focus on chemo-naive tumors.


table(tothillClinical[, "Neo"])
## 
##       N   Y 
##   3 264  18

sampleUse[tothillClinical[, "Neo"] == ""] <- "Unused"
sampleUse[tothillClinical[, "Neo"] == "Y"] <- "Unused"

whyExcluded[tothillClinical[, "Neo"] == ""] <- paste(whyExcluded[tothillClinical[, 
    "Neo"] == ""], "-NeoAdj Unk-", sep = "")
whyExcluded[tothillClinical[, "Neo"] == "Y"] <- paste(whyExcluded[tothillClinical[, 
    "Neo"] == ""], "-NeoAdj Chemo-", sep = "")

table(sampleUse)
## sampleUse
## Unused   Used 
##     91    194

With respect to neoadjuvant chemo, we exlude patients who either received therapy or for whom this info is unavailable. This mostly reduces the number of RD samples.

4.6 Grade

Next, we look at grade. We want only Grade 2 or 3 samples.


table(tothillClinical[, "Grade"])
## 
##   1   2   3 
##  19  97 164

sampleUse[is.na(tothillClinical[, "Grade"])] <- "Unused"
sampleUse[tothillClinical[, "Grade"] == 1] <- "Unused"

whyExcluded[is.na(tothillClinical[, "Grade"])] <- paste(whyExcluded[is.na(tothillClinical[, 
    "Grade"])], "-Grade NA-", sep = "")
whyExcluded[which(tothillClinical[, "Grade"] == 1)] <- paste(whyExcluded[which(tothillClinical[, 
    "Grade"] == 1)], "-Grade 1-", sep = "")

table(sampleUse)
## sampleUse
## Unused   Used 
##     96    189

4.7 Final Tally

Now we see how many RD and No RD samples remain.


table(sampleUse, tothillRD)
##          tothillRD
## sampleUse No RD  RD
##    Unused    34  25
##    Used      50 139

5 Building the Data Frame

Now we bundle the assembled information into a data frame for later use.


tothillFilteredSamples <- data.frame(sampleUse = sampleUse, whyExcluded = whyExcluded, 
    row.names = rownames(tothillClinical))

6 Saving RData

Now we save the relevant information to an RData object.


save(tothillFilteredSamples, file = file.path("RDataObjects", "tothillFilteredSamples.RData"))

7 Appendix

7.1 File Location


getwd()
## [1] "/Users/slt/SLT WORKSPACE/EXEMPT/OVARIAN/Ovarian residual disease study 2012/RD manuscript/Web page for paper/Webpage"

7.2 SessionInfo


sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.5
## 
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.1 formatR_0.9    stringr_0.6.2  tools_3.0.2

8 References

[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.