Filtering Samples from the Tothill Data to Focus on RD
======================================================

by Keith A. Baggerly

## 1 Executive Summary

### 1.1 Introduction

[Tothill et al.](#tothill08) profiled 285 ovarian
tumor samples, but not all of the patients had the
same type of disease, or had residual disease (RD) information
recorded. We want to identify the high-grade serous
ovarian tumors with RD information to focus the question
more precisely.

### 1.2 Methods

Starting with the previously assembled table of clinical
information, we examine the various columns and see which
clinical features would justify exclusion from the set
being examined.

We consider
- RD status, excluding samples with no RD information.
- Clinical Type, excluding low malignant potential (LMP) samples.
- Histologic Subtype, excluding non-serous (Adeno and Endo) samples.
- Array Site, excluding samples not coming from the
  ovary (OV) or peritoneum (PE).
- Neoadjuvant Treatment, excluding samples from patients
  who received chemotherapy before sample acquisition.
- Grade, excluding Grade 1 samples.


We use these rules to build up a data frame with
two columns: sampleUse (Used or Unused), and whyExcluded.

### 1.3 Results

We exclude 96 of the 285 samples for various reasons.
Of the 189 that remain, 139 are RD and 50 are No RD.

We save tothillFilteredSamples
to the RData file "tothillFilteredSamples.RData".

## 2 Libraries

We first load the libraries we will use
in this report.


## 3 Loading the Data

Here we simply load the previously assembled clinical information.


```r

load(file.path("RDataObjects", "tothillClinical.RData"))
tothillClinical[1:3, ]
```

```
##         GEO.ID SampleID KMeansGroup ClinicalType HistologicSubtype
## X49  GSM249839       49           5          MAL               Ser
## X129 GSM250001      129           1          MAL               Ser
## X146 GSM250000      146          NC          MAL               Ser
##      PrimarySite Stage Grade Age Status Pltx Tax Neo MosToRelapse
## X49           OV   III     3  56      D    Y   N   N            7
## X129          OV   III     3  65      D    Y   N   N            7
## X146          OV   III     3  56     PF    Y   N   N          166
##      MosToDeath ResidDisease ArraySite
## X49           8           <1        OV
## X129         15           >1        PE
## X146        166           >1        OV
```


## 4 Filtering Samples Used

We now walk through the various criteria, and seeing
what these imply for inclusion of the various samples.
Our default assumption is that all samples are used.


```r

sampleUse <- rep("Used", nrow(tothillClinical))
names(sampleUse) <- rownames(tothillClinical)

whyExcluded <- rep("", nrow(tothillClinical))
names(whyExcluded) <- rownames(tothillClinical)
```


### 4.1 Residual Disease

First, we check residual disease status, and
exclude patients with no information.


```r

table(tothillClinical[, "ResidDisease"])
```

```
## 
##            <1            >1 macro size NK           nil            NK 
##            76            70            18            84            37
```

```r

sampleUse[tothillClinical[, "ResidDisease"] == "NK"] <- "Unused"
whyExcluded[tothillClinical[, "ResidDisease"] == "NK"] <- paste(whyExcluded[tothillClinical[, 
    "ResidDisease"] == "NK"], "-No RD Info-", sep = "")

table(sampleUse)
```

```
## sampleUse
## Unused   Used 
##     37    248
```


### 4.2 Clinical Type

Next, we look at clinical type.
Some of the samples are known to be of low malignant
potential (LMP), and we don't want to use them.


```r

table(tothillClinical[, "ClinicalType"])
```

```
## 
## LMP MAL 
##  18 267
```

```r

sampleUse[tothillClinical[, "ClinicalType"] == "LMP"] <- "Unused"
whyExcluded[tothillClinical[, "ClinicalType"] == "LMP"] <- paste(whyExcluded[tothillClinical[, 
    "ClinicalType"] == "LMP"], "-LMP-", sep = "")

table(sampleUse)
```

```
## sampleUse
## Unused   Used 
##     53    232
```


### 4.3 Histologic Subtype

Next, we look at histologic subtype.
We only want to keep serous (Ser) tumor samples.


```r

table(tothillClinical[, "HistologicSubtype"])
```

```
## 
## Adeno  Endo   Ser 
##     1    20   264
```

```r

sampleUse[tothillClinical[, "HistologicSubtype"] == "Adeno"] <- "Unused"
sampleUse[tothillClinical[, "HistologicSubtype"] == "Endo"] <- "Unused"

whyExcluded[tothillClinical[, "HistologicSubtype"] == "Adeno"] <- paste(whyExcluded[tothillClinical[, 
    "HistologicSubtype"] == "Adeno"], "-Adeno Subtype-", sep = "")
whyExcluded[tothillClinical[, "HistologicSubtype"] == "Endo"] <- paste(whyExcluded[tothillClinical[, 
    "HistologicSubtype"] == "Endo"], "-Endo Subtype-", sep = "")

table(sampleUse)
```

```
## sampleUse
## Unused   Used 
##     68    217
```


### 4.4 Array Site

Next, we look at the site the sample was taken from
(the "array site""). We want tumors from the ovary
or the peritoneum.


```r

table(tothillClinical[, "ArraySite"])
```

```
## 
##       BN       CO       FT       OM    Other       OV OV or OM       PE 
##        1        4        2        2        3      200        1       71 
##       UT 
##        1
```

```r

sampleUse[!is.element(tothillClinical[, "ArraySite"], c("OV", "PE"))] <- "Unused"
whyExcluded[!is.element(tothillClinical[, "ArraySite"], c("OV", "PE"))] <- paste(whyExcluded[!is.element(tothillClinical[, 
    "ArraySite"], c("OV", "PE"))], "-Not OV or PE-", sep = "")

table(sampleUse)
```

```
## sampleUse
## Unused   Used 
##     77    208
```


### 4.5 Neoadjuvant Chemo

Next, we look at whether the patients received
neoadjuvant chemotherapy. We want to focus on
chemo-naive tumors.


```r

table(tothillClinical[, "Neo"])
```

```
## 
##       N   Y 
##   3 264  18
```

```r

sampleUse[tothillClinical[, "Neo"] == ""] <- "Unused"
sampleUse[tothillClinical[, "Neo"] == "Y"] <- "Unused"

whyExcluded[tothillClinical[, "Neo"] == ""] <- paste(whyExcluded[tothillClinical[, 
    "Neo"] == ""], "-NeoAdj Unk-", sep = "")
whyExcluded[tothillClinical[, "Neo"] == "Y"] <- paste(whyExcluded[tothillClinical[, 
    "Neo"] == ""], "-NeoAdj Chemo-", sep = "")

table(sampleUse)
```

```
## sampleUse
## Unused   Used 
##     91    194
```


With respect to neoadjuvant chemo, we exlude patients who
either received therapy or for whom this info is unavailable.
This mostly reduces the number of RD samples.

### 4.6 Grade

Next, we look at grade. We want only Grade 2 or 3
samples.


```r

table(tothillClinical[, "Grade"])
```

```
## 
##   1   2   3 
##  19  97 164
```

```r

sampleUse[is.na(tothillClinical[, "Grade"])] <- "Unused"
sampleUse[tothillClinical[, "Grade"] == 1] <- "Unused"

whyExcluded[is.na(tothillClinical[, "Grade"])] <- paste(whyExcluded[is.na(tothillClinical[, 
    "Grade"])], "-Grade NA-", sep = "")
whyExcluded[which(tothillClinical[, "Grade"] == 1)] <- paste(whyExcluded[which(tothillClinical[, 
    "Grade"] == 1)], "-Grade 1-", sep = "")

table(sampleUse)
```

```
## sampleUse
## Unused   Used 
##     96    189
```


### 4.7 Final Tally

Now we see how many RD and No RD samples remain.


```r

table(sampleUse, tothillRD)
```

```
##          tothillRD
## sampleUse No RD  RD
##    Unused    34  25
##    Used      50 139
```


## 5 Building the Data Frame

Now we bundle the assembled information into a data
frame for later use.


```r

tothillFilteredSamples <- data.frame(sampleUse = sampleUse, whyExcluded = whyExcluded, 
    row.names = rownames(tothillClinical))
```


## 6 Saving RData

Now we save the relevant information to an RData object.


```r

save(tothillFilteredSamples, file = file.path("RDataObjects", "tothillFilteredSamples.RData"))
```


## 7 Appendix

### 7.1 File Location


```r

getwd()
```

```
## [1] "/Users/slt/SLT WORKSPACE/EXEMPT/OVARIAN/Ovarian residual disease study 2012/RD manuscript/Web page for paper/Webpage"
```


### 7.2 SessionInfo


```r

sessionInfo()
```

```
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.5
## 
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.1 formatR_0.9    stringr_0.6.2  tools_3.0.2
```


## 8 References

> <p id="tothill08">
  [1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S,
  Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N,
  Fereday S, Hung JA, Chiew YE, Haviv I;
  Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD.
  Novel molecular subtypes of serous and endometrioid ovarian cancer
  linked to clinical outcome.
  <em>Clin Cancer Res</em>, <b>14(16)<b>:5198-208, 2008. <p>