Overall Survival Curves for TCGA and Tothill by RD Status 
========================================================

by Susan L. Tucker


```r
opts_chunk$set(tidy = TRUE, message = TRUE)
```


## 1 Executive Summary

### 1.1 Introduction

The goal of this analysis is to produce Kaplan-Meier curves of overall survival (OS) by residual disease (RD) status for patients included in TCGA and Tothill et al.

### 1.2 Data \& Methods

We use the RData objects containing clinical information created in previous reports (assembleTCGAClinical, assembleTothillClinical). Patients are filtered as described previously (filterTCGASamples, filterTothillSamples). Additional patients are excluded for whom survival information is missing. 

Survival times are converted from months to years for the data of Tothill et al.

Kaplan-Meier plots are produced to illustrate OS in patient cohorts. OS is compared between groups using the log-rank test.

Comparisons considered are:

i) TCGA versus Tothill et al.

ii) Within each dataset by the RD categories provided in the original data sources.

iii) Within each dataset, any RD compared to no RD.

iv) Within each dataset, by FABP4 expression.

### 1.3 Results

Three patients are excluded from the filtered cohort of Tothill et al. because of missing survival information.

OS is essentially identical in TCGA and Tothill et al.

Within each data set, OS differs significantly by RD status, using both the RD categories provided or comparing any RD to no RD.

In each data set, OS is worse among the 25% of patients with the highest expression levels of FABP4. The difference reaches statistical significance in Tothill et al.

## 2 Loading \& Filtration of Data

The data objects are loaded. 


```r

load(file.path("RDataObjects", "tcgaClinical.RData"))
load(file.path("RDataObjects", "tcgaFilteredSamples.RData"))
load(file.path("RDataObjects", "tcgaExpression.RData"))

load(file.path("RDataObjects", "tothillClinical.RData"))
load(file.path("RDataObjects", "tothillFilteredSamples.RData"))
load(file.path("RDataObjects", "tothillExpression.RData"))
```


Filtrations are applied to the TCGA data. 


```r

rownames(tcgaFilteredSamples)[1:2]
```

```
## [1] "TCGA-13-0758-01A-01R-0362-01" "TCGA-09-0364-01A-02R-0362-01"
```

```r
rownames(tcgaClinical)[1:2]
```

```
## [1] "TCGA-04-1331" "TCGA-04-1332"
```

```r
rownames(tcgaOSYrs)[1:2]
```

```
## [1] "TCGA-04-1331" "TCGA-04-1332"
```

```r
colnames(tcgaExpression[, 1:2])
```

```
## [1] "TCGA-13-0758-01A-01R-0362-01" "TCGA-09-0364-01A-02R-0362-01"
```

```r

tcgaSampleUseLong <- rownames(tcgaFilteredSamples[which(tcgaFilteredSamples[, 
    "sampleUse"] == "Used"), ])
tcgaSampleUse <- substr(tcgaSampleUseLong, 1, 12)
length(tcgaSampleUse)
```

```
## [1] 491
```

```r
length(unique(tcgaSampleUse))
```

```
## [1] 491
```

```r

tcgaOSYrsUse <- tcgaOSYrs[tcgaSampleUse, ]
summary(tcgaOSYrsUse)
```

```
##       time            status     
##  Min.   : 0.025   Min.   :0.000  
##  1st Qu.: 0.936   1st Qu.:0.000  
##  Median : 2.323   Median :1.000  
##  Mean   : 2.652   Mean   :0.532  
##  3rd Qu.: 3.760   3rd Qu.:1.000  
##  Max.   :12.666   Max.   :1.000
```

```r
tcgaClinUse <- tcgaClinical[tcgaSampleUse, ]
tcgaRDUse <- tcgaRD[tcgaSampleUse]
table(tcgaRDUse)
```

```
## tcgaRDUse
## No RD    RD 
##   113   378
```

```r

tcgaExpressionUse <- tcgaExpression[, tcgaSampleUseLong]
colnames(tcgaExpressionUse) <- tcgaSampleUse
```


Filtrations are applied to the data of Tothill et al. and survival times are converted from months to years.


```r

rownames(tothillFilteredSamples)[1:2]
```

```
## [1] "X49"  "X129"
```

```r
rownames(tothillClinical)[1:2]
```

```
## [1] "X49"  "X129"
```

```r
rownames(tothillOSMos)[1:2]
```

```
## [1] "X49"  "X129"
```

```r
colnames(tothillExpression[, 1:2])
```

```
## [1] "X60120" "X32117"
```

```r

tothillSampleUseTmp <- rownames(tothillFilteredSamples[which(tothillFilteredSamples[, 
    "sampleUse"] == "Used"), ])
length(tothillSampleUseTmp)
```

```
## [1] 189
```

```r
summary(tothillOSMos[tothillSampleUseTmp, ])
```

```
##       time           status     
##  Min.   :  0.0   Min.   :0.000  
##  1st Qu.: 18.0   1st Qu.:0.000  
##  Median : 27.0   Median :0.000  
##  Mean   : 30.7   Mean   :0.455  
##  3rd Qu.: 41.0   3rd Qu.:1.000  
##  Max.   :166.0   Max.   :1.000  
##  NA's   :3
```

```r

tothillSampleUse <- intersect(tothillSampleUseTmp, rownames(tothillOSMos[!is.na(tothillOSMos[, 
    1]), ]))
length(tothillSampleUse)
```

```
## [1] 186
```

```r

tothillOSYrsUse <- tothillOSMos[tothillSampleUse, ]
tothillOSYrsUse[, 1] <- tothillOSYrsUse[, 1]/12
tothillClinUse <- tothillClinical[tothillSampleUse, ]
tothillRDUse <- tothillRD[tothillSampleUse]
table(tothillRDUse)
```

```
## tothillRDUse
## No RD    RD 
##    50   136
```

```r

tothillExpressionUse <- tothillExpression[, tothillSampleUse]
```


## 3 Analyses

Overall survival is compared in TCGA versus Tothill et al.


```r

tmp <- rbind(tcgaOSYrsUse, tothillOSYrsUse)

library(survival)
```

```
## Loading required package: splines
```

```r
osAll <- Surv(tmp[, 1], tmp[, 2] == 1)

cohort <- rep(2, dim(osAll)[1])
cohort[1:dim(tcgaOSYrsUse)[1]] <- 1
table(cohort)
```

```
## cohort
##   1   2 
## 491 186
```

```r

fit <- survfit(osAll ~ cohort)
survdiff(osAll ~ cohort)
```

```
## Call:
## survdiff(formula = osAll ~ cohort)
## 
##            N Observed Expected (O-E)^2/E (O-E)^2/V
## cohort=1 491      261    260.7  0.000242  0.000993
## cohort=2 186       86     86.3  0.000731  0.000993
## 
##  Chisq= 0  on 1 degrees of freedom, p= 0.975
```

```r

plot(fit, lty = c(1, 2), xlab = "Years", ylab = "Overall Survival", lwd = 2, 
    main = "Overall Survival in TCGA versus Tothill")
legend(x = 8, y = 0.95, legend = c("TCGA (N=491)", "Tothill (N=186)"), lty = c(1, 
    2), lwd = 2)
text(11, 0.7, "P = 0.975")
```

![plot of chunk compareOS](figure/compareOS.png) 


Overall survival by residual disease status is plotted for the TCGA data.


```r

table(tcgaClinUse$tumor_residual_disease)
```

```
## 
##        [Not Available]                 >20 mm                1-10 mm 
##                      0                    102                    242 
##               11-20 mm No Macroscopic disease 
##                     34                    113
```

```r
tcgaGp <- rep(1, dim(tcgaOSYrsUse)[1])
tcgaGp[which(tcgaClinUse[, "tumor_residual_disease"] == "1-10 mm")] <- 2
tcgaGp[which(tcgaClinUse[, "tumor_residual_disease"] == "11-20 mm")] <- 3
tcgaGp[which(tcgaClinUse[, "tumor_residual_disease"] == ">20 mm")] <- 4

survTCGA <- Surv(tcgaOSYrsUse[, 1], tcgaOSYrsUse[, 2] == 1)
tcgaSurvFit <- survfit(survTCGA ~ tcgaGp)
survdiff(survTCGA ~ tcgaGp)
```

```
## Call:
## survdiff(formula = survTCGA ~ tcgaGp)
## 
##            N Observed Expected (O-E)^2/E (O-E)^2/V
## tcgaGp=1 113       30     59.9    14.951    19.507
## tcgaGp=2 242      147    131.5     1.830     3.726
## tcgaGp=3  34       23     20.6     0.281     0.306
## tcgaGp=4 102       61     49.0     2.949     3.647
## 
##  Chisq= 20.1  on 3 degrees of freedom, p= 0.000162
```

```r

plot(tcgaSurvFit, lty = 1:4, xlab = "Years after Surgery", ylab = "Proportion Surviving", 
    lwd = 2)
legend(x = 5, y = 0.98, legend = c("No macroscopic disease (N=113)", "1-10 mm (N=242)", 
    "11-20 mm (N=34)", ">20 mm (N=102)"), lty = c(1:4), lwd = 2, cex = 0.8)
text(0.05, 0.05, "(A) TCGA", pos = 4)
text(7, 0.6, "P = 0.0002", pos = 4)
```

![plot of chunk kmTCGA](figure/kmTCGA.png) 


Overall survival by residual disease status is plotted for the data of Tothill et al.


```r

table(tothillClinUse$ResidDisease)
```

```
## 
##            <1            >1 macro size NK           nil            NK 
##            66            57            13            50             0
```

```r
tothillGp <- rep(1, dim(tothillOSYrsUse)[1])
tothillGp[which(tothillClinUse[, "ResidDisease"] == "<1")] <- 2
tothillGp[which(tothillClinUse[, "ResidDisease"] == ">1")] <- 3
tothillGp[which(tothillClinUse[, "ResidDisease"] == "macro size NK")] <- 4

survTothill <- Surv(tothillOSYrsUse[, 1], tothillOSYrsUse[, 2] == 1)
tothillSurvFit <- survfit(survTothill ~ tothillGp)
survdiff(survTothill ~ tothillGp)
```

```
## Call:
## survdiff(formula = survTothill ~ tothillGp)
## 
##              N Observed Expected (O-E)^2/E (O-E)^2/V
## tothillGp=1 50       14    26.97   6.23891   9.36475
## tothillGp=2 66       34    27.17   1.71681   2.64353
## tothillGp=3 57       31    25.10   1.38667   2.00101
## tothillGp=4 13        7     6.76   0.00872   0.00971
## 
##  Chisq= 9.7  on 3 degrees of freedom, p= 0.0217
```

```r

plot(tothillSurvFit, lty = 1:4, xlab = "Years after Surgery", ylab = "Proportion Surviving", 
    lwd = 2)
legend(x = 7, y = 0.98, legend = c("nil (N=50)", "<1 (N=66)", ">1 (N=57)", "macro size NK (N=13)"), 
    lty = c(1:4), lwd = 2, cex = 0.8)
text(0.05, 0.05, "(B) Tothill et al.", pos = 4)
text(8, 0.6, "P = 0.0217", pos = 4)
```

![plot of chunk kmTothill](figure/kmTothill.png) 


For each data set, patients with any RD are compared to patients without RD. We do this first for the TCGA data.


```r

table(tcgaRDUse)
```

```
## tcgaRDUse
## No RD    RD 
##   113   378
```

```r
tcgaSurvFit <- survfit(survTCGA ~ tcgaRDUse)
survdiff(survTCGA ~ tcgaRDUse)
```

```
## Call:
## survdiff(formula = survTCGA ~ tcgaRDUse)
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## tcgaRDUse=No RD 113       30     59.9     14.95      19.5
## tcgaRDUse=RD    378      231    201.1      4.46      19.5
## 
##  Chisq= 19.5  on 1 degrees of freedom, p= 1e-05
```

```r

plot(tcgaSurvFit, lty = 1:4, xlab = "Years after Surgery", ylab = "Proportion Surviving", 
    lwd = 2)
legend(x = 6, y = 0.98, legend = c("No RD (N=113)", "Any RD (N=378)"), lty = c(1:4), 
    lwd = 2, cex = 0.8)
text(0.05, 0.05, "(A) TCGA", pos = 4)
text(7, 0.6, "P < 0.0001", pos = 4)
```

![plot of chunk kmTCGArdVSnoRD](figure/kmTCGArdVSnoRD.png) 


We next do this for the data of Tothill et al.


```r

table(tothillRDUse)
```

```
## tothillRDUse
## No RD    RD 
##    50   136
```

```r
tothillSurvFit <- survfit(survTothill ~ tothillRDUse)
survdiff(survTothill ~ tothillRDUse)
```

```
## Call:
## survdiff(formula = survTothill ~ tothillRDUse)
## 
##                      N Observed Expected (O-E)^2/E (O-E)^2/V
## tothillRDUse=No RD  50       14       27      6.24      9.36
## tothillRDUse=RD    136       72       59      2.85      9.36
## 
##  Chisq= 9.4  on 1 degrees of freedom, p= 0.00221
```

```r

plot(tothillSurvFit, lty = 1:4, xlab = "Years", ylab = "Proportion Surviving", 
    lwd = 2)
legend(x = 6, y = 0.98, legend = c("No RD (N=50)", "Any RD (N=136)"), lty = c(1:4), 
    lwd = 2, cex = 0.8)
text(0.05, 0.05, "(B) Tothill et al.", pos = 4)
text(7, 0.6, "P  0.0022", pos = 4)
```

![plot of chunk kmTothilRdVSnoRD](figure/kmTothilRdVSnoRD.png) 


We produce the TCGA plot for the manuscript.


```r

table(tcgaRDUse)
```

```
## tcgaRDUse
## No RD    RD 
##   113   378
```

```r
tcgaSurvFit <- survfit(survTCGA ~ tcgaRDUse)
survdiff(survTCGA ~ tcgaRDUse)
```

```
## Call:
## survdiff(formula = survTCGA ~ tcgaRDUse)
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## tcgaRDUse=No RD 113       30     59.9     14.95      19.5
## tcgaRDUse=RD    378      231    201.1      4.46      19.5
## 
##  Chisq= 19.5  on 1 degrees of freedom, p= 1e-05
```

```r

plot(tcgaSurvFit, lty = 1:4, xlab = "Years after Surgery", ylab = "Proportion Surviving", 
    lwd = 2)
legend(x = 6, y = 0.98, legend = c("No RD (N=113)", "Any RD (N=378)"), lty = c(1:4), 
    lwd = 2, cex = 0.8)
text(0.05, 0.05, "(A) TCGA", pos = 4)
text(7, 0.6, "P < 0.001", pos = 4)
```

![plot of chunk kmTCGArdVSnoRDms](figure/kmTCGArdVSnoRDms.png) 


We do the same thing for Tothill et al.


```r

table(tothillRDUse)
```

```
## tothillRDUse
## No RD    RD 
##    50   136
```

```r
tothillSurvFit <- survfit(survTothill ~ tothillRDUse)
survdiff(survTothill ~ tothillRDUse)
```

```
## Call:
## survdiff(formula = survTothill ~ tothillRDUse)
## 
##                      N Observed Expected (O-E)^2/E (O-E)^2/V
## tothillRDUse=No RD  50       14       27      6.24      9.36
## tothillRDUse=RD    136       72       59      2.85      9.36
## 
##  Chisq= 9.4  on 1 degrees of freedom, p= 0.00221
```

```r

plot(tothillSurvFit, lty = 1:4, xlab = "Years", ylab = "Proportion Surviving", 
    lwd = 2)
legend(x = 6, y = 0.98, legend = c("No RD (N=50)", "Any RD (N=136)"), lty = c(1:4), 
    lwd = 2, cex = 0.8)
text(0.05, 0.05, "(B) Tothill et al.", pos = 4)
text(7, 0.6, "P  0.002", pos = 4)
```

![plot of chunk kmTothilRdVSnoRDms](figure/kmTothilRdVSnoRDms.png) 


We also look at OS in each data set for patients with FABP4 in the top 25% compared to the lower 75%. We begin with TCGA.


```r

probeNames <- rownames(tcgaExpressionUse)

library(hthgu133a.db)
```

```
## Loading required package: AnnotationDbi
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## 
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## 
## The following object is masked from 'package:stats':
## 
##     xtabs
## 
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, as.data.frame, cbind, colnames, duplicated,
##     eval, Filter, Find, get, intersect, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rep.int, rownames, sapply,
##     setdiff, sort, table, tapply, union, unique, unlist
## 
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Loading required package: org.Hs.eg.db
## Loading required package: DBI
```

```r
geneNames <- unlist(mget(probeNames, hthgu133aSYMBOL))
probesFABP4 <- probeNames[which(geneNames == "FABP4")]
probesFABP4
```

```
## [1] "203980_at"
```

```r

tcgaFABP4 <- tcgaExpressionUse[probesFABP4, ]

tcgaFABP4Gp <- rep(0, length(tcgaFABP4))
tcgaFABP4Gp[tcgaFABP4 > quantile(tcgaFABP4, probs = c(0.75))] <- 1
table(tcgaFABP4Gp)
```

```
## tcgaFABP4Gp
##   0   1 
## 368 123
```

```r

tcgaSurvFit <- survfit(survTCGA ~ tcgaFABP4Gp)
survdiff(survTCGA ~ tcgaFABP4Gp)
```

```
## Call:
## survdiff(formula = survTCGA ~ tcgaFABP4Gp)
## 
##                 N Observed Expected (O-E)^2/E (O-E)^2/V
## tcgaFABP4Gp=0 368      190    200.1     0.505      2.19
## tcgaFABP4Gp=1 123       71     60.9     1.659      2.19
## 
##  Chisq= 2.2  on 1 degrees of freedom, p= 0.139
```

```r

plot(tcgaSurvFit, lty = 1:4, xlab = "Years", ylab = "Proportion Surviving", 
    lwd = 2)
legend(x = 6, y = 0.98, legend = c("Low FABP4 (N=368)", "High FABP4 (N=123)"), 
    lty = c(1, 2), lwd = 2, cex = 0.8)
text(0.05, 0.05, "(A) TCGA", pos = 4)
text(7, 0.6, "P  0.139", pos = 4)
```

![plot of chunk kmTCGAfabp4](figure/kmTCGAfabp4.png) 


We repeat, using the data of Tothill et al.


```r

tothillFABP4 <- tothillExpressionUse[probesFABP4, ]

tothillFABP4Gp <- rep(0, length(tothillFABP4))
tothillFABP4Gp[tothillFABP4 > quantile(tothillFABP4, probs = c(0.75))] <- 1
table(tothillFABP4Gp)
```

```
## tothillFABP4Gp
##   0   1 
## 139  47
```

```r

tothillSurvFit <- survfit(survTothill ~ tothillFABP4Gp)
survdiff(survTothill ~ tothillFABP4Gp)
```

```
## Call:
## survdiff(formula = survTothill ~ tothillFABP4Gp)
## 
##                    N Observed Expected (O-E)^2/E (O-E)^2/V
## tothillFABP4Gp=0 139       56     67.4      1.92      9.17
## tothillFABP4Gp=1  47       30     18.6      6.97      9.17
## 
##  Chisq= 9.2  on 1 degrees of freedom, p= 0.00246
```

```r

plot(tothillSurvFit, lty = 1:2, xlab = "Years", ylab = "Proportion Surviving", 
    lwd = 2)
legend(x = 6, y = 0.98, legend = c("Low FABP4 (N=139)", "High FABP4 (N=47)"), 
    lty = c(1:2), lwd = 2, cex = 0.8)
text(0.05, 0.05, "(B) Tothill et al.", pos = 4)
text(7, 0.6, "P  0.0025", pos = 4)
```

![plot of chunk kmTothillFabp4](figure/kmTothillFabp4.png) 


## 4 Appendix

### 4.1 File Location


```r
getwd()
```

```
## [1] "/Users/slt/SLT WORKSPACE/EXEMPT/OVARIAN/Ovarian residual disease study 2012/RD manuscript/Web page for paper/Webpage"
```


### 4.2 SessionInfo


```r
sessionInfo()
```

```
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  splines   stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] hthgu133a.db_2.9.0   org.Hs.eg.db_2.9.0   RSQLite_0.11.4      
## [4] DBI_0.2-7            AnnotationDbi_1.22.6 Biobase_2.20.1      
## [7] BiocGenerics_0.6.0   survival_2.37-4      knitr_1.5           
## 
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.1 formatR_0.9    IRanges_1.18.4 stats4_3.0.2  
## [5] stringr_0.6.2  tools_3.0.2
```