by Keith A. Baggerly
Tothill et al. profiled 285 ovarian tumor samples, but not all of the patients had the same type of disease, or had residual disease (RD) information recorded. We want to identify the high-grade serous ovarian tumors with RD information to focus the question more precisely.
Starting with the previously assembled table of clinical information, we examine the various columns and see which clinical features would justify exclusion from the set being examined.
We consider
We use these rules to build up a data frame with two columns: sampleUse (Used or Unused), and whyExcluded.
We exclude 96 of the 285 samples for various reasons. Of the 189 that remain, 139 are RD and 50 are No RD.
We save tothillFilteredSamples to the RData file “tothillFilteredSamples.RData”.
We first load the libraries we will use in this report.
Here we simply load the previously assembled clinical information.
load(file.path("RDataObjects", "tothillClinical.RData"))
tothillClinical[1:3, ]
## GEO.ID SampleID KMeansGroup ClinicalType HistologicSubtype
## X49 GSM249839 49 5 MAL Ser
## X129 GSM250001 129 1 MAL Ser
## X146 GSM250000 146 NC MAL Ser
## PrimarySite Stage Grade Age Status Pltx Tax Neo MosToRelapse
## X49 OV III 3 56 D Y N N 7
## X129 OV III 3 65 D Y N N 7
## X146 OV III 3 56 PF Y N N 166
## MosToDeath ResidDisease ArraySite
## X49 8 <1 OV
## X129 15 >1 PE
## X146 166 >1 OV
We now walk through the various criteria, and seeing what these imply for inclusion of the various samples. Our default assumption is that all samples are used.
sampleUse <- rep("Used", nrow(tothillClinical))
names(sampleUse) <- rownames(tothillClinical)
whyExcluded <- rep("", nrow(tothillClinical))
names(whyExcluded) <- rownames(tothillClinical)
First, we check residual disease status, and exclude patients with no information.
table(tothillClinical[, "ResidDisease"])
##
## <1 >1 macro size NK nil NK
## 76 70 18 84 37
sampleUse[tothillClinical[, "ResidDisease"] == "NK"] <- "Unused"
whyExcluded[tothillClinical[, "ResidDisease"] == "NK"] <- paste(whyExcluded[tothillClinical[,
"ResidDisease"] == "NK"], "-No RD Info-", sep = "")
table(sampleUse)
## sampleUse
## Unused Used
## 37 248
Next, we look at clinical type. Some of the samples are known to be of low malignant potential (LMP), and we don't want to use them.
table(tothillClinical[, "ClinicalType"])
##
## LMP MAL
## 18 267
sampleUse[tothillClinical[, "ClinicalType"] == "LMP"] <- "Unused"
whyExcluded[tothillClinical[, "ClinicalType"] == "LMP"] <- paste(whyExcluded[tothillClinical[,
"ClinicalType"] == "LMP"], "-LMP-", sep = "")
table(sampleUse)
## sampleUse
## Unused Used
## 53 232
Next, we look at histologic subtype. We only want to keep serous (Ser) tumor samples.
table(tothillClinical[, "HistologicSubtype"])
##
## Adeno Endo Ser
## 1 20 264
sampleUse[tothillClinical[, "HistologicSubtype"] == "Adeno"] <- "Unused"
sampleUse[tothillClinical[, "HistologicSubtype"] == "Endo"] <- "Unused"
whyExcluded[tothillClinical[, "HistologicSubtype"] == "Adeno"] <- paste(whyExcluded[tothillClinical[,
"HistologicSubtype"] == "Adeno"], "-Adeno Subtype-", sep = "")
whyExcluded[tothillClinical[, "HistologicSubtype"] == "Endo"] <- paste(whyExcluded[tothillClinical[,
"HistologicSubtype"] == "Endo"], "-Endo Subtype-", sep = "")
table(sampleUse)
## sampleUse
## Unused Used
## 68 217
Next, we look at the site the sample was taken from (the “array site”“). We want tumors from the ovary or the peritoneum.
table(tothillClinical[, "ArraySite"])
##
## BN CO FT OM Other OV OV or OM PE
## 1 4 2 2 3 200 1 71
## UT
## 1
sampleUse[!is.element(tothillClinical[, "ArraySite"], c("OV", "PE"))] <- "Unused"
whyExcluded[!is.element(tothillClinical[, "ArraySite"], c("OV", "PE"))] <- paste(whyExcluded[!is.element(tothillClinical[,
"ArraySite"], c("OV", "PE"))], "-Not OV or PE-", sep = "")
table(sampleUse)
## sampleUse
## Unused Used
## 77 208
Next, we look at whether the patients received neoadjuvant chemotherapy. We want to focus on chemo-naive tumors.
table(tothillClinical[, "Neo"])
##
## N Y
## 3 264 18
sampleUse[tothillClinical[, "Neo"] == ""] <- "Unused"
sampleUse[tothillClinical[, "Neo"] == "Y"] <- "Unused"
whyExcluded[tothillClinical[, "Neo"] == ""] <- paste(whyExcluded[tothillClinical[,
"Neo"] == ""], "-NeoAdj Unk-", sep = "")
whyExcluded[tothillClinical[, "Neo"] == "Y"] <- paste(whyExcluded[tothillClinical[,
"Neo"] == ""], "-NeoAdj Chemo-", sep = "")
table(sampleUse)
## sampleUse
## Unused Used
## 91 194
With respect to neoadjuvant chemo, we exlude patients who either received therapy or for whom this info is unavailable. This mostly reduces the number of RD samples.
Next, we look at grade. We want only Grade 2 or 3 samples.
table(tothillClinical[, "Grade"])
##
## 1 2 3
## 19 97 164
sampleUse[is.na(tothillClinical[, "Grade"])] <- "Unused"
sampleUse[tothillClinical[, "Grade"] == 1] <- "Unused"
whyExcluded[is.na(tothillClinical[, "Grade"])] <- paste(whyExcluded[is.na(tothillClinical[,
"Grade"])], "-Grade NA-", sep = "")
whyExcluded[which(tothillClinical[, "Grade"] == 1)] <- paste(whyExcluded[which(tothillClinical[,
"Grade"] == 1)], "-Grade 1-", sep = "")
table(sampleUse)
## sampleUse
## Unused Used
## 96 189
Now we see how many RD and No RD samples remain.
table(sampleUse, tothillRD)
## tothillRD
## sampleUse No RD RD
## Unused 34 25
## Used 50 139
Now we bundle the assembled information into a data frame for later use.
tothillFilteredSamples <- data.frame(sampleUse = sampleUse, whyExcluded = whyExcluded,
row.names = rownames(tothillClinical))
Now we save the relevant information to an RData object.
save(tothillFilteredSamples, file = file.path("RDataObjects", "tothillFilteredSamples.RData"))
getwd()
## [1] "/Users/slt/SLT WORKSPACE/EXEMPT/OVARIAN/Ovarian residual disease study 2012/RD manuscript/Web page for paper/Webpage"
sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.5
##
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.1 formatR_0.9 stringr_0.6.2 tools_3.0.2
[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.