by Keith A. Baggerly
We want to produce an RData file with the clinical information for the ovarian cancer samples profiled by Tothill et al.
We acquired clinical annotation from two sources on Sep 13, 2012: “clinical_anns.csv” from the Gene Expression Omnibus (GEO) page for GSE9891, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9891, and p.27-30 of the supplementary data pdf for Tothill et al., http://clincancerres.aacrjournals.org/content/14/16/5198/suppl/DC1.
A csv file of this annotation, together with an extra column specifying the GEO GSM ID for each sample, is stored in RawData as tothillClinical.csv.
We load the clinical information into a data frame, and construct R “Surv” objects for overall and progression (relapse) free survival.
We also construct a binary indicator vector for the presence or absence of residual disease (RD).
We save tothillClinical, tothillOSMos, tothillPFSMos, and tothillRD to the RData file “tothillClinical.RData”.
We first load the options and libraries we will use in this report.
library(survival)
Here we simply load the table of clinical information.
tothillClinical <- read.table(file.path("RawData", "Tothill", "Clinical", "tothillClinical.csv"),
header = TRUE, sep = ",")
dim(tothillClinical)
## [1] 285 17
tothillClinical[1:3, ]
## GEO.ID SampleID KMeansGroup ClinicalType HistologicSubtype
## 1 GSM249839 49 5 MAL Ser
## 2 GSM250001 129 1 MAL Ser
## 3 GSM250000 146 NC MAL Ser
## PrimarySite Stage Grade Age Status Pltx Tax Neo MosToRelapse MosToDeath
## 1 OV III 3 56 D Y N N 7 8
## 2 OV III 3 65 D Y N N 7 15
## 3 OV III 3 56 PF Y N N 166 166
## ResidDisease ArraySite
## 1 <1 OV
## 2 >1 PE
## 3 >1 OV
rownames(tothillClinical) <- paste("X", tothillClinical[, "SampleID"], sep = "")
Next, we define R “Surv” objects for overall survival (OS) and progression-free survival (PFS). We begin by looking at the recorded values for patient status.
table(tothillClinical[, "Status"])
##
## D D* PF R
## 3 111 2 92 77
According to the supplementary information table, D = Dead, D* = Dead of Other Causes, PF = Alive Progression-Free, and R = Alive and Relapsed.
Next, we define indicator vectors for OS and PFS. We begin with OS.
tothillOSStatus <- rep(NA, nrow(tothillClinical))
tothillOSStatus[tothillClinical$Status == "D"] <- "Uncensored"
tothillOSStatus[tothillClinical$Status == "D*"] <- "Uncensored"
tothillOSStatus[tothillClinical$Status == "PF"] <- "Censored"
tothillOSStatus[tothillClinical$Status == "R"] <- "Censored"
table(tothillOSStatus)
## tothillOSStatus
## Censored Uncensored
## 169 113
Next we deal with PFS.
tothillPFStatus <- rep(NA, nrow(tothillClinical))
tothillPFStatus[tothillClinical$Status == "D"] <- "Uncensored"
tothillPFStatus[tothillClinical$Status == "D*"] <- "Uncensored"
tothillPFStatus[tothillClinical$Status == "PF"] <- "Censored"
tothillPFStatus[tothillClinical$Status == "R"] <- "Uncensored"
table(tothillPFStatus)
## tothillPFStatus
## Censored Uncensored
## 92 190
Now we create the Surv objects.
tothillOSMos <- Surv(tothillClinical[, "MosToDeath"], tothillOSStatus == "Uncensored")
rownames(tothillOSMos) <- rownames(tothillClinical)
tothillPFSMos <- Surv(tothillClinical[, "MosToRelapse"], tothillPFStatus ==
"Uncensored")
rownames(tothillPFSMos) <- rownames(tothillClinical)
Now we summarize the Residual Disease (RD) information into a single indicator vector specifying if there is any RD (“RD”) or no RD (“No RD”). We begin by tabulating the information we have.
table(tothillClinical[, "ResidDisease"])
##
## <1 >1 macro size NK nil NK
## 76 70 18 84 37
According to the supplementary information from Tothill et al., “macro size NK” = macroscopic disease size unknown (but there is some), and “NK” = residual disease unknown.
We now define the indicator.
tothillRD <- rep(NA, nrow(tothillClinical))
tothillRD[tothillClinical[, "ResidDisease"] == "<1"] <- "RD"
tothillRD[tothillClinical[, "ResidDisease"] == ">1"] <- "RD"
tothillRD[tothillClinical[, "ResidDisease"] == "macro size NK"] <- "RD"
tothillRD[tothillClinical[, "ResidDisease"] == "nil"] <- "No RD"
table(tothillRD)
## tothillRD
## No RD RD
## 84 164
names(tothillRD) <- rownames(tothillClinical)
Now we save the relevant information to an RData object.
save(tothillClinical, tothillOSMos, tothillPFSMos, tothillRD, file = file.path("RDataObjects",
"tothillClinical.RData"))
getwd()
## [1] "\\\\mdadqsfs02/workspace/kabagg/RDPaper/Webpage/ResidualDisease"
sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] splines stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] survival_2.37-4 knitr_1.2
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.3 evaluate_0.4.3 formatR_0.7 stringr_0.6.2
## [5] tools_2.15.3
[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.