Assembling Clinical Information for the Tothill Ovarian Data

by Keith A. Baggerly

1 Executive Summary

1.1 Introduction

We want to produce an RData file with the clinical information for the ovarian cancer samples profiled by Tothill et al.

1.2 Methods

We acquired clinical annotation from two sources on Sep 13, 2012: “clinical_anns.csv” from the Gene Expression Omnibus (GEO) page for GSE9891, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9891, and p.27-30 of the supplementary data pdf for Tothill et al., http://clincancerres.aacrjournals.org/content/14/16/5198/suppl/DC1.

A csv file of this annotation, together with an extra column specifying the GEO GSM ID for each sample, is stored in RawData as tothillClinical.csv.

We load the clinical information into a data frame, and construct R “Surv” objects for overall and progression (relapse) free survival.

We also construct a binary indicator vector for the presence or absence of residual disease (RD).

1.3 Results

We save tothillClinical, tothillOSMos, tothillPFSMos, and tothillRD to the RData file “tothillClinical.RData”.

2 Options and Libraries

We first load the options and libraries we will use in this report.


library(survival)

3 Loading the Data

Here we simply load the table of clinical information.


tothillClinical <- read.table(file.path("RawData", "Tothill", "Clinical", "tothillClinical.csv"), 
    header = TRUE, sep = ",")
dim(tothillClinical)
## [1] 285  17
tothillClinical[1:3, ]
##      GEO.ID SampleID KMeansGroup ClinicalType HistologicSubtype
## 1 GSM249839       49           5          MAL               Ser
## 2 GSM250001      129           1          MAL               Ser
## 3 GSM250000      146          NC          MAL               Ser
##   PrimarySite Stage Grade Age Status Pltx Tax Neo MosToRelapse MosToDeath
## 1          OV   III     3  56      D    Y   N   N            7          8
## 2          OV   III     3  65      D    Y   N   N            7         15
## 3          OV   III     3  56     PF    Y   N   N          166        166
##   ResidDisease ArraySite
## 1           <1        OV
## 2           >1        PE
## 3           >1        OV

rownames(tothillClinical) <- paste("X", tothillClinical[, "SampleID"], sep = "")

4 Defining Overall and Progression-Free Survival

Next, we define R “Surv” objects for overall survival (OS) and progression-free survival (PFS). We begin by looking at the recorded values for patient status.


table(tothillClinical[, "Status"])
## 
##       D  D*  PF   R 
##   3 111   2  92  77

According to the supplementary information table, D = Dead, D* = Dead of Other Causes, PF = Alive Progression-Free, and R = Alive and Relapsed.

Next, we define indicator vectors for OS and PFS. We begin with OS.


tothillOSStatus <- rep(NA, nrow(tothillClinical))
tothillOSStatus[tothillClinical$Status == "D"] <- "Uncensored"
tothillOSStatus[tothillClinical$Status == "D*"] <- "Uncensored"
tothillOSStatus[tothillClinical$Status == "PF"] <- "Censored"
tothillOSStatus[tothillClinical$Status == "R"] <- "Censored"
table(tothillOSStatus)
## tothillOSStatus
##   Censored Uncensored 
##        169        113

Next we deal with PFS.


tothillPFStatus <- rep(NA, nrow(tothillClinical))
tothillPFStatus[tothillClinical$Status == "D"] <- "Uncensored"
tothillPFStatus[tothillClinical$Status == "D*"] <- "Uncensored"
tothillPFStatus[tothillClinical$Status == "PF"] <- "Censored"
tothillPFStatus[tothillClinical$Status == "R"] <- "Uncensored"
table(tothillPFStatus)
## tothillPFStatus
##   Censored Uncensored 
##         92        190

Now we create the Surv objects.


tothillOSMos <- Surv(tothillClinical[, "MosToDeath"], tothillOSStatus == "Uncensored")
rownames(tothillOSMos) <- rownames(tothillClinical)

tothillPFSMos <- Surv(tothillClinical[, "MosToRelapse"], tothillPFStatus == 
    "Uncensored")
rownames(tothillPFSMos) <- rownames(tothillClinical)

5 Defining a Residual Disease Indicator

Now we summarize the Residual Disease (RD) information into a single indicator vector specifying if there is any RD (“RD”) or no RD (“No RD”). We begin by tabulating the information we have.


table(tothillClinical[, "ResidDisease"])
## 
##            <1            >1 macro size NK           nil            NK 
##            76            70            18            84            37

According to the supplementary information from Tothill et al., “macro size NK” = macroscopic disease size unknown (but there is some), and “NK” = residual disease unknown.

We now define the indicator.


tothillRD <- rep(NA, nrow(tothillClinical))
tothillRD[tothillClinical[, "ResidDisease"] == "<1"] <- "RD"
tothillRD[tothillClinical[, "ResidDisease"] == ">1"] <- "RD"
tothillRD[tothillClinical[, "ResidDisease"] == "macro size NK"] <- "RD"
tothillRD[tothillClinical[, "ResidDisease"] == "nil"] <- "No RD"
table(tothillRD)
## tothillRD
## No RD    RD 
##    84   164
names(tothillRD) <- rownames(tothillClinical)

6 Saving RData

Now we save the relevant information to an RData object.


save(tothillClinical, tothillOSMos, tothillPFSMos, tothillRD, file = file.path("RDataObjects", 
    "tothillClinical.RData"))

7 Appendix

7.1 File Location


getwd()
## [1] "\\\\mdadqsfs02/workspace/kabagg/RDPaper/Webpage/ResidualDisease"

7.2 SessionInfo


sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] splines   stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] survival_2.37-4 knitr_1.2      
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    stringr_0.6.2 
## [5] tools_2.15.3

8 References

[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.