Assembling Clinical Information for the Tothill Ovarian Data ============================================================ by Keith A. Baggerly ## 1 Executive Summary ### 1.1 Introduction We want to produce an RData file with the clinical information for the ovarian cancer samples profiled by [Tothill et al.](#tothill08) ### 1.2 Methods We acquired clinical annotation from two sources on Sep 13, 2012: "clinical\_anns.csv" from the Gene Expression Omnibus (GEO) page for GSE9891, [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9891](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9891), and p.27-30 of the supplementary data pdf for [Tothill et al.](#tothill08), [http://clincancerres.aacrjournals.org/content/14/16/5198/suppl/DC1](http://clincancerres.aacrjournals.org/content/14/16/5198/suppl/DC1). A csv file of this annotation, together with an extra column specifying the GEO GSM ID for each sample, is stored in RawData as tothillClinical.csv. We load the clinical information into a data frame, and construct R "Surv" objects for overall and progression (relapse) free survival. We also construct a binary indicator vector for the presence or absence of residual disease (RD). ### 1.3 Results We save tothillClinical, tothillOSMos, tothillPFSMos, and tothillRD to the RData file "tothillClinical.RData". ## 2 Options and Libraries We first load the options and libraries we will use in this report. ```r library(survival) ``` ## 3 Loading the Data Here we simply load the table of clinical information. ```r tothillClinical <- read.table(file.path("RawData", "Tothill", "Clinical", "tothillClinical.csv"), header = TRUE, sep = ",") dim(tothillClinical) ``` ``` ## [1] 285 17 ``` ```r tothillClinical[1:3, ] ``` ``` ## GEO.ID SampleID KMeansGroup ClinicalType HistologicSubtype ## 1 GSM249839 49 5 MAL Ser ## 2 GSM250001 129 1 MAL Ser ## 3 GSM250000 146 NC MAL Ser ## PrimarySite Stage Grade Age Status Pltx Tax Neo MosToRelapse MosToDeath ## 1 OV III 3 56 D Y N N 7 8 ## 2 OV III 3 65 D Y N N 7 15 ## 3 OV III 3 56 PF Y N N 166 166 ## ResidDisease ArraySite ## 1 <1 OV ## 2 >1 PE ## 3 >1 OV ``` ```r rownames(tothillClinical) <- paste("X", tothillClinical[, "SampleID"], sep = "") ``` ## 4 Defining Overall and Progression-Free Survival Next, we define R "Surv" objects for overall survival (OS) and progression-free survival (PFS). We begin by looking at the recorded values for patient status. ```r table(tothillClinical[, "Status"]) ``` ``` ## ## D D* PF R ## 3 111 2 92 77 ``` According to the supplementary information table, D = Dead, D* = Dead of Other Causes, PF = Alive Progression-Free, and R = Alive and Relapsed. Next, we define indicator vectors for OS and PFS. We begin with OS. ```r tothillOSStatus <- rep(NA, nrow(tothillClinical)) tothillOSStatus[tothillClinical$Status == "D"] <- "Uncensored" tothillOSStatus[tothillClinical$Status == "D*"] <- "Uncensored" tothillOSStatus[tothillClinical$Status == "PF"] <- "Censored" tothillOSStatus[tothillClinical$Status == "R"] <- "Censored" table(tothillOSStatus) ``` ``` ## tothillOSStatus ## Censored Uncensored ## 169 113 ``` Next we deal with PFS. ```r tothillPFStatus <- rep(NA, nrow(tothillClinical)) tothillPFStatus[tothillClinical$Status == "D"] <- "Uncensored" tothillPFStatus[tothillClinical$Status == "D*"] <- "Uncensored" tothillPFStatus[tothillClinical$Status == "PF"] <- "Censored" tothillPFStatus[tothillClinical$Status == "R"] <- "Uncensored" table(tothillPFStatus) ``` ``` ## tothillPFStatus ## Censored Uncensored ## 92 190 ``` Now we create the Surv objects. ```r tothillOSMos <- Surv(tothillClinical[, "MosToDeath"], tothillOSStatus == "Uncensored") rownames(tothillOSMos) <- rownames(tothillClinical) tothillPFSMos <- Surv(tothillClinical[, "MosToRelapse"], tothillPFStatus == "Uncensored") rownames(tothillPFSMos) <- rownames(tothillClinical) ``` ## 5 Defining a Residual Disease Indicator Now we summarize the Residual Disease (RD) information into a single indicator vector specifying if there is any RD ("RD") or no RD ("No RD"). We begin by tabulating the information we have. ```r table(tothillClinical[, "ResidDisease"]) ``` ``` ## ## <1 >1 macro size NK nil NK ## 76 70 18 84 37 ``` According to the supplementary information from [Tothill et al.](#tothill08), "macro size NK" = macroscopic disease size unknown (but there is some), and "NK" = residual disease unknown. We now define the indicator. ```r tothillRD <- rep(NA, nrow(tothillClinical)) tothillRD[tothillClinical[, "ResidDisease"] == "<1"] <- "RD" tothillRD[tothillClinical[, "ResidDisease"] == ">1"] <- "RD" tothillRD[tothillClinical[, "ResidDisease"] == "macro size NK"] <- "RD" tothillRD[tothillClinical[, "ResidDisease"] == "nil"] <- "No RD" table(tothillRD) ``` ``` ## tothillRD ## No RD RD ## 84 164 ``` ```r names(tothillRD) <- rownames(tothillClinical) ``` ## 6 Saving RData Now we save the relevant information to an RData object. ```r save(tothillClinical, tothillOSMos, tothillPFSMos, tothillRD, file = file.path("RDataObjects", "tothillClinical.RData")) ``` ## 7 Appendix ### 7.1 File Location ```r getwd() ``` ``` ## [1] "\\\\mdadqsfs02/workspace/kabagg/RDPaper/Webpage/ResidualDisease" ``` ### 7.2 SessionInfo ```r sessionInfo() ``` ``` ## R version 2.15.3 (2013-03-01) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## ## locale: ## [1] LC_COLLATE=English_United States.1252 ## [2] LC_CTYPE=English_United States.1252 ## [3] LC_MONETARY=English_United States.1252 ## [4] LC_NUMERIC=C ## [5] LC_TIME=English_United States.1252 ## ## attached base packages: ## [1] splines stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] survival_2.37-4 knitr_1.2 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.3 evaluate_0.4.3 formatR_0.7 stringr_0.6.2 ## [5] tools_2.15.3 ``` ## 8 References >
[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.