Filtering Samples from the Tothill Data to Focus on RD ====================================================== by Keith A. Baggerly ## 1 Executive Summary ### 1.1 Introduction [Tothill et al.](#tothill08) profiled 285 ovarian tumor samples, but not all of the patients had the same type of disease, or had residual disease (RD) information recorded. We want to identify the high-grade serous ovarian tumors with RD information to focus the question more precisely. ### 1.2 Methods Starting with the previously assembled table of clinical information, we examine the various columns and see which clinical features would justify exclusion from the set being examined. We consider - RD status, excluding samples with no RD information. - Clinical Type, excluding low malignant potential (LMP) samples. - Histologic Subtype, excluding non-serous (Adeno and Endo) samples. - Array Site, excluding samples not coming from the ovary (OV) or peritoneum (PE). - Neoadjuvant Treatment, excluding samples from patients who received chemotherapy before sample acquisition. - Grade, excluding Grade 1 samples. We use these rules to build up a data frame with two columns: sampleUse (Used or Unused), and whyExcluded. ### 1.3 Results We exclude 96 of the 285 samples for various reasons. Of the 189 that remain, 139 are RD and 50 are No RD. We save tothillFilteredSamples to the RData file "tothillFilteredSamples.RData". ## 2 Libraries We first load the libraries we will use in this report. ```{r libraries} ``` ## 3 Loading the Data Here we simply load the previously assembled clinical information. ```{r loadTothillClinical} load(file.path("RDataObjects","tothillClinical.RData")) tothillClinical[1:3,] ``` ## 4 Filtering Samples Used We now walk through the various criteria, and seeing what these imply for inclusion of the various samples. Our default assumption is that all samples are used. ```{r setDefaults} sampleUse <- rep("Used", nrow(tothillClinical)) names(sampleUse) <- rownames(tothillClinical) whyExcluded <- rep("", nrow(tothillClinical)) names(whyExcluded) <- rownames(tothillClinical) ``` ### 4.1 Residual Disease First, we check residual disease status, and exclude patients with no information. ```{r checkRD} table(tothillClinical[,"ResidDisease"]) sampleUse[tothillClinical[,"ResidDisease"]=="NK"] <- "Unused" whyExcluded[tothillClinical[,"ResidDisease"]=="NK"] <- paste(whyExcluded[tothillClinical[,"ResidDisease"]=="NK"], "-No RD Info-",sep="") table(sampleUse) ``` ### 4.2 Clinical Type Next, we look at clinical type. Some of the samples are known to be of low malignant potential (LMP), and we don't want to use them. ```{r checkClinicalType} table(tothillClinical[,"ClinicalType"]) sampleUse[tothillClinical[,"ClinicalType"]=="LMP"] <- "Unused" whyExcluded[tothillClinical[,"ClinicalType"]=="LMP"] <- paste(whyExcluded[tothillClinical[,"ClinicalType"]=="LMP"], "-LMP-",sep="") table(sampleUse) ``` ### 4.3 Histologic Subtype Next, we look at histologic subtype. We only want to keep serous (Ser) tumor samples. ```{r checkHistologicSubtype} table(tothillClinical[,"HistologicSubtype"]) sampleUse[tothillClinical[,"HistologicSubtype"]=="Adeno"] <- "Unused" sampleUse[tothillClinical[,"HistologicSubtype"]=="Endo"] <- "Unused" whyExcluded[tothillClinical[,"HistologicSubtype"]=="Adeno"] <- paste(whyExcluded[tothillClinical[,"HistologicSubtype"]=="Adeno"], "-Adeno Subtype-",sep="") whyExcluded[tothillClinical[,"HistologicSubtype"]=="Endo"] <- paste(whyExcluded[tothillClinical[,"HistologicSubtype"]=="Endo"], "-Endo Subtype-",sep="") table(sampleUse) ``` ### 4.4 Array Site Next, we look at the site the sample was taken from (the "array site""). We want tumors from the ovary or the peritoneum. ```{r checkArraySite} table(tothillClinical[,"ArraySite"]) sampleUse[!is.element(tothillClinical[,"ArraySite"],c("OV","PE"))] <- "Unused" whyExcluded[!is.element(tothillClinical[,"ArraySite"],c("OV","PE"))] <- paste(whyExcluded[!is.element(tothillClinical[,"ArraySite"], c("OV","PE"))], "-Not OV or PE-",sep="") table(sampleUse) ``` ### 4.5 Neoadjuvant Chemo Next, we look at whether the patients received neoadjuvant chemotherapy. We want to focus on chemo-naive tumors. ```{r checkNeoadjuvant} table(tothillClinical[,"Neo"]) sampleUse[tothillClinical[,"Neo"]==""] <- "Unused" sampleUse[tothillClinical[,"Neo"]=="Y"] <- "Unused" whyExcluded[tothillClinical[,"Neo"]==""] <- paste(whyExcluded[tothillClinical[,"Neo"]==""], "-NeoAdj Unk-",sep="") whyExcluded[tothillClinical[,"Neo"]=="Y"] <- paste(whyExcluded[tothillClinical[,"Neo"]==""], "-NeoAdj Chemo-",sep="") table(sampleUse) ``` With respect to neoadjuvant chemo, we exlude patients who either received therapy or for whom this info is unavailable. This mostly reduces the number of RD samples. ### 4.6 Grade Next, we look at grade. We want only Grade 2 or 3 samples. ```{r checkGrade} table(tothillClinical[,"Grade"]) sampleUse[is.na(tothillClinical[,"Grade"])] <- "Unused" sampleUse[tothillClinical[,"Grade"]==1] <- "Unused" whyExcluded[is.na(tothillClinical[,"Grade"])] <- paste(whyExcluded[is.na(tothillClinical[,"Grade"])], "-Grade NA-",sep="") whyExcluded[which(tothillClinical[,"Grade"]==1)] <- paste(whyExcluded[which(tothillClinical[,"Grade"]==1)], "-Grade 1-",sep="") table(sampleUse) ``` ### 4.7 Final Tally Now we see how many RD and No RD samples remain. ```{r checkTally} table(sampleUse,tothillRD) ``` ## 5 Building the Data Frame Now we bundle the assembled information into a data frame for later use. ```{r buildDataFrame} tothillFilteredSamples <- data.frame(sampleUse=sampleUse, whyExcluded=whyExcluded, row.names=rownames(tothillClinical)) ``` ## 6 Saving RData Now we save the relevant information to an RData object. ```{r saveTothillClinical} save(tothillFilteredSamples, file=file.path("RDataObjects","tothillFilteredSamples.RData")) ``` ## 7 Appendix ### 7.1 File Location ```{r getLocation} getwd() ``` ### 7.2 SessionInfo ```{r sessionInfo} sessionInfo(); ``` ## 8 References >

[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.