Filtering Samples from the Tothill Data to Focus on RD ====================================================== by Keith A. Baggerly ## 1 Executive Summary ### 1.1 Introduction [Tothill et al.](#tothill08) profiled 285 ovarian tumor samples, but not all of the patients had the same type of disease, or had residual disease (RD) information recorded. We want to identify the high-grade serous ovarian tumors with RD information to focus the question more precisely. ### 1.2 Methods Starting with the previously assembled table of clinical information, we examine the various columns and see which clinical features would justify exclusion from the set being examined. We consider - RD status, excluding samples with no RD information. - Clinical Type, excluding low malignant potential (LMP) samples. - Histologic Subtype, excluding non-serous (Adeno and Endo) samples. - Array Site, excluding samples not coming from the ovary (OV) or peritoneum (PE). - Neoadjuvant Treatment, excluding samples from patients who received chemotherapy before sample acquisition. - Grade, excluding Grade 1 samples. We use these rules to build up a data frame with two columns: sampleUse (Used or Unused), and whyExcluded. ### 1.3 Results We exclude 96 of the 285 samples for various reasons. Of the 189 that remain, 139 are RD and 50 are No RD. We save tothillFilteredSamples to the RData file "tothillFilteredSamples.RData". ## 2 Libraries We first load the libraries we will use in this report. ## 3 Loading the Data Here we simply load the previously assembled clinical information. ```r load(file.path("RDataObjects", "tothillClinical.RData")) tothillClinical[1:3, ] ``` ``` ## GEO.ID SampleID KMeansGroup ClinicalType HistologicSubtype ## X49 GSM249839 49 5 MAL Ser ## X129 GSM250001 129 1 MAL Ser ## X146 GSM250000 146 NC MAL Ser ## PrimarySite Stage Grade Age Status Pltx Tax Neo MosToRelapse ## X49 OV III 3 56 D Y N N 7 ## X129 OV III 3 65 D Y N N 7 ## X146 OV III 3 56 PF Y N N 166 ## MosToDeath ResidDisease ArraySite ## X49 8 <1 OV ## X129 15 >1 PE ## X146 166 >1 OV ``` ## 4 Filtering Samples Used We now walk through the various criteria, and seeing what these imply for inclusion of the various samples. Our default assumption is that all samples are used. ```r sampleUse <- rep("Used", nrow(tothillClinical)) names(sampleUse) <- rownames(tothillClinical) whyExcluded <- rep("", nrow(tothillClinical)) names(whyExcluded) <- rownames(tothillClinical) ``` ### 4.1 Residual Disease First, we check residual disease status, and exclude patients with no information. ```r table(tothillClinical[, "ResidDisease"]) ``` ``` ## ## <1 >1 macro size NK nil NK ## 76 70 18 84 37 ``` ```r sampleUse[tothillClinical[, "ResidDisease"] == "NK"] <- "Unused" whyExcluded[tothillClinical[, "ResidDisease"] == "NK"] <- paste(whyExcluded[tothillClinical[, "ResidDisease"] == "NK"], "-No RD Info-", sep = "") table(sampleUse) ``` ``` ## sampleUse ## Unused Used ## 37 248 ``` ### 4.2 Clinical Type Next, we look at clinical type. Some of the samples are known to be of low malignant potential (LMP), and we don't want to use them. ```r table(tothillClinical[, "ClinicalType"]) ``` ``` ## ## LMP MAL ## 18 267 ``` ```r sampleUse[tothillClinical[, "ClinicalType"] == "LMP"] <- "Unused" whyExcluded[tothillClinical[, "ClinicalType"] == "LMP"] <- paste(whyExcluded[tothillClinical[, "ClinicalType"] == "LMP"], "-LMP-", sep = "") table(sampleUse) ``` ``` ## sampleUse ## Unused Used ## 53 232 ``` ### 4.3 Histologic Subtype Next, we look at histologic subtype. We only want to keep serous (Ser) tumor samples. ```r table(tothillClinical[, "HistologicSubtype"]) ``` ``` ## ## Adeno Endo Ser ## 1 20 264 ``` ```r sampleUse[tothillClinical[, "HistologicSubtype"] == "Adeno"] <- "Unused" sampleUse[tothillClinical[, "HistologicSubtype"] == "Endo"] <- "Unused" whyExcluded[tothillClinical[, "HistologicSubtype"] == "Adeno"] <- paste(whyExcluded[tothillClinical[, "HistologicSubtype"] == "Adeno"], "-Adeno Subtype-", sep = "") whyExcluded[tothillClinical[, "HistologicSubtype"] == "Endo"] <- paste(whyExcluded[tothillClinical[, "HistologicSubtype"] == "Endo"], "-Endo Subtype-", sep = "") table(sampleUse) ``` ``` ## sampleUse ## Unused Used ## 68 217 ``` ### 4.4 Array Site Next, we look at the site the sample was taken from (the "array site""). We want tumors from the ovary or the peritoneum. ```r table(tothillClinical[, "ArraySite"]) ``` ``` ## ## BN CO FT OM Other OV OV or OM PE ## 1 4 2 2 3 200 1 71 ## UT ## 1 ``` ```r sampleUse[!is.element(tothillClinical[, "ArraySite"], c("OV", "PE"))] <- "Unused" whyExcluded[!is.element(tothillClinical[, "ArraySite"], c("OV", "PE"))] <- paste(whyExcluded[!is.element(tothillClinical[, "ArraySite"], c("OV", "PE"))], "-Not OV or PE-", sep = "") table(sampleUse) ``` ``` ## sampleUse ## Unused Used ## 77 208 ``` ### 4.5 Neoadjuvant Chemo Next, we look at whether the patients received neoadjuvant chemotherapy. We want to focus on chemo-naive tumors. ```r table(tothillClinical[, "Neo"]) ``` ``` ## ## N Y ## 3 264 18 ``` ```r sampleUse[tothillClinical[, "Neo"] == ""] <- "Unused" sampleUse[tothillClinical[, "Neo"] == "Y"] <- "Unused" whyExcluded[tothillClinical[, "Neo"] == ""] <- paste(whyExcluded[tothillClinical[, "Neo"] == ""], "-NeoAdj Unk-", sep = "") whyExcluded[tothillClinical[, "Neo"] == "Y"] <- paste(whyExcluded[tothillClinical[, "Neo"] == ""], "-NeoAdj Chemo-", sep = "") table(sampleUse) ``` ``` ## sampleUse ## Unused Used ## 91 194 ``` With respect to neoadjuvant chemo, we exlude patients who either received therapy or for whom this info is unavailable. This mostly reduces the number of RD samples. ### 4.6 Grade Next, we look at grade. We want only Grade 2 or 3 samples. ```r table(tothillClinical[, "Grade"]) ``` ``` ## ## 1 2 3 ## 19 97 164 ``` ```r sampleUse[is.na(tothillClinical[, "Grade"])] <- "Unused" sampleUse[tothillClinical[, "Grade"] == 1] <- "Unused" whyExcluded[is.na(tothillClinical[, "Grade"])] <- paste(whyExcluded[is.na(tothillClinical[, "Grade"])], "-Grade NA-", sep = "") whyExcluded[which(tothillClinical[, "Grade"] == 1)] <- paste(whyExcluded[which(tothillClinical[, "Grade"] == 1)], "-Grade 1-", sep = "") table(sampleUse) ``` ``` ## sampleUse ## Unused Used ## 96 189 ``` ### 4.7 Final Tally Now we see how many RD and No RD samples remain. ```r table(sampleUse, tothillRD) ``` ``` ## tothillRD ## sampleUse No RD RD ## Unused 34 25 ## Used 50 139 ``` ## 5 Building the Data Frame Now we bundle the assembled information into a data frame for later use. ```r tothillFilteredSamples <- data.frame(sampleUse = sampleUse, whyExcluded = whyExcluded, row.names = rownames(tothillClinical)) ``` ## 6 Saving RData Now we save the relevant information to an RData object. ```r save(tothillFilteredSamples, file = file.path("RDataObjects", "tothillFilteredSamples.RData")) ``` ## 7 Appendix ### 7.1 File Location ```r getwd() ``` ``` ## [1] "/Users/slt/SLT WORKSPACE/EXEMPT/OVARIAN/Ovarian residual disease study 2012/RD manuscript/Web page for paper/Webpage" ``` ### 7.2 SessionInfo ```r sessionInfo() ``` ``` ## R version 3.0.2 (2013-09-25) ## Platform: x86_64-apple-darwin10.8.0 (64-bit) ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] knitr_1.5 ## ## loaded via a namespace (and not attached): ## [1] evaluate_0.5.1 formatR_0.9 stringr_0.6.2 tools_3.0.2 ``` ## 8 References >

[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.