Assembling Clinical Information for the Tothill Ovarian Data ============================================================ by Keith A. Baggerly ## 1 Executive Summary ### 1.1 Introduction We want to produce an RData file with the clinical information for the ovarian cancer samples profiled by [Tothill et al.](#tothill08) ### 1.2 Methods We acquired clinical annotation from two sources on Sep 13, 2012: "clinical\_anns.csv" from the Gene Expression Omnibus (GEO) page for GSE9891, [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9891](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9891), and p.27-30 of the supplementary data pdf for [Tothill et al.](#tothill08), [http://clincancerres.aacrjournals.org/content/14/16/5198/suppl/DC1](http://clincancerres.aacrjournals.org/content/14/16/5198/suppl/DC1). A csv file of this annotation, together with an extra column specifying the GEO GSM ID for each sample, is stored in RawData as tothillClinical.csv. We load the clinical information into a data frame, and construct R "Surv" objects for overall and progression (relapse) free survival. We also construct a binary indicator vector for the presence or absence of residual disease (RD). ### 1.3 Results We save tothillClinical, tothillOSMos, tothillPFSMos, and tothillRD to the RData file "tothillClinical.RData". ## 2 Options and Libraries We first load the options and libraries we will use in this report. ```{r libraries, message=FALSE} library(survival) ``` ## 3 Loading the Data Here we simply load the table of clinical information. ```{r loadTothillClinical} tothillClinical <- read.table(file.path("RawData","Tothill","Clinical","tothillClinical.csv"), header=TRUE, sep=",") dim(tothillClinical) tothillClinical[1:3,] rownames(tothillClinical) <- paste("X",tothillClinical[,"SampleID"],sep="") ``` ## 4 Defining Overall and Progression-Free Survival Next, we define R "Surv" objects for overall survival (OS) and progression-free survival (PFS). We begin by looking at the recorded values for patient status. ```{r examineStatus} table(tothillClinical[,"Status"]) ``` According to the supplementary information table, D = Dead, D* = Dead of Other Causes, PF = Alive Progression-Free, and R = Alive and Relapsed. Next, we define indicator vectors for OS and PFS. We begin with OS. ```{r defineOS} tothillOSStatus <- rep(NA,nrow(tothillClinical)) tothillOSStatus[tothillClinical$Status=="D"] <- "Uncensored" tothillOSStatus[tothillClinical$Status=="D*"] <- "Uncensored" tothillOSStatus[tothillClinical$Status=="PF"] <- "Censored" tothillOSStatus[tothillClinical$Status=="R"] <- "Censored" table(tothillOSStatus) ``` Next we deal with PFS. ```{r definePFS} tothillPFStatus <- rep(NA,nrow(tothillClinical)) tothillPFStatus[tothillClinical$Status=="D"] <- "Uncensored" tothillPFStatus[tothillClinical$Status=="D*"] <- "Uncensored" tothillPFStatus[tothillClinical$Status=="PF"] <- "Censored" tothillPFStatus[tothillClinical$Status=="R"] <- "Uncensored" table(tothillPFStatus) ``` Now we create the Surv objects. ```{r createSurvs} tothillOSMos <- Surv(tothillClinical[,"MosToDeath"], tothillOSStatus=="Uncensored") rownames(tothillOSMos) <- rownames(tothillClinical) tothillPFSMos <- Surv(tothillClinical[,"MosToRelapse"], tothillPFStatus=="Uncensored") rownames(tothillPFSMos) <- rownames(tothillClinical) ``` ## 5 Defining a Residual Disease Indicator Now we summarize the Residual Disease (RD) information into a single indicator vector specifying if there is any RD ("RD") or no RD ("No RD"). We begin by tabulating the information we have. ```{r tableRDStatus} table(tothillClinical[,"ResidDisease"]) ``` According to the supplementary information from [Tothill et al.](#tothill08), "macro size NK" = macroscopic disease size unknown (but there is some), and "NK" = residual disease unknown. We now define the indicator. ```{r specifyRDIndicator} tothillRD <- rep(NA,nrow(tothillClinical)) tothillRD[tothillClinical[,"ResidDisease"]=="<1"] <- "RD" tothillRD[tothillClinical[,"ResidDisease"]==">1"] <- "RD" tothillRD[tothillClinical[,"ResidDisease"]=="macro size NK"] <- "RD" tothillRD[tothillClinical[,"ResidDisease"]=="nil"] <- "No RD" table(tothillRD) names(tothillRD) <- rownames(tothillClinical) ``` ## 6 Saving RData Now we save the relevant information to an RData object. ```{r saveTothillClinical} save(tothillClinical, tothillOSMos, tothillPFSMos, tothillRD, file=file.path("RDataObjects","tothillClinical.RData")) ``` ## 7 Appendix ### 7.1 File Location ```{r getLocation} getwd() ``` ### 7.2 SessionInfo ```{r sessionInfo} sessionInfo(); ``` ## 8 References >
[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.