Supplementary Appendix

This web page contains supplementary material, including supplementary methods and supplementary reports describing all data analyses, for the manuscript entitled "Molecular Biomarkers of Residual Disease after Surgical Debulking of High-Grade Serous Ovarian Cancer" by the ovarian cancer working group at MD Anderson Cancer Center.

All analyses were performed by Keith A. Baggerly, Shelley M. Herbrich, Susan L. Tucker, or Anna Unruh.

This page was last updated on Tuesday, April 1, 2014. The files posted here will not be changed after publication, allowing the web site to serve as permanent documentation of our analysis. Any changes will be posted on a separate page designed for addenda, errata, corrigenda and other adjustments.

Our analyses make use of raw data (e.g. Affymetrix CEL files) from a variety of sources. These files are not reproduced here, just links to where the data can be obtained.

TCGA Ovarian Cancer Affymetrix CEL Files. We used version 1007 throughout; only the mage-tab folder appears to have been updated (to version 1008) as of May 7, 2013.
TCGA Ovarian Cancer Clinical Data. These files are updated quite regularly, and we do not know where earlier versions can be found. The values we derived from the "clinical_patient_ov.txt" file for RD status, etc, are built in to the RData file we produced (available below).
Tothill et al CEL Files and Clinical Information.
Bonome et al CEL Files. Clinical information (e.g., optimal/suboptimal surgery outcome) is available from the individual pages, which are easily parseable using the GEOquery R package (see the "assemblingCCLEClinical" report below).
CCLE CEL Files from the initial publication. Primary site and histology information are given in the component pages, and assembled using the GEOquery R package (see the "assemblingCCLEClinical" report below).

Supplementary Methods:

Data for validation of biomarker datasets. The first of these was from the study of Bonome et al. [11]. We downloaded CEL files (Affymetrix U133A arrays, N=195; 185 tumor samples and 10 normal ovary) from the Gene Expression Omnibus (GEO; GSE26712) on September 10, 2012. The samples in this study were lasercapture microdissected, and the surgical outcome recorded as optimal or suboptimal. These data were used to assess whether qualitative differences in gene expression observed in the first two datasets (TCGA and Tothill et al.) were present here as well. The other dataset was from the Cancer Cell Line Encyclopedia (CCLE) [12]. We downloaded CEL files (Affymetrix U133+2 arrays, N=917) described in the initial CCLE publication from GEO (GSE36133) on September 14, 2012. These data were used to determine whether differences in gene expression seen in tumor samples are present in ovarian cancer cell lines.

Quantitative RT-PCR analysis. Total RNA was extracted from the tumor tissues using the TRIzol® extraction method. RNA was then quantified using a nanodrop method and the 260/280 ratios were also checked to determine quality. RNA (1µg/sample) was reverse transcribed into cDNA using the Verso cDNA kit (Thermo Scientific, West Palm Beach, FL) according to the manufacturer's protocol.

qRT-PCR was performed on a 7500 PCR system (Applied Biosystems, Warrington, UK) using 1µL of cDNA for each sample. SYBR green (Applied Biosystems) was used to detect the products and 20pmoles of primer were used for the reaction. All reactions were carried out with 20µL of reaction mix and were performed in triplicate. We used the following primers: For FABP4, 5'-TGATGATCATGTTAGGTTTGGC-3' (forward) and 5'-TGGAAACTTGTCTCCAGTGAA-3' (reverse). For ADH1B, 5'- AGGGTAGAGGAGGCTGAAGA-3' (forward), 5'-ACCTGCTTCACTCTGGGAAA-3' (reverse). The PCR reactions were run under the following conditions: 50°C for 2 minutes, 95°C for 15 minutes, followed by 40 cycles at 95°C for 1 minute each. All reactions were analyzed with the 7500 Applied Biosystems PCR software (v.2.0.5). The cycle threshold (Ct) values of the target genes were initially normalized to the Ct values of 18S rRNA and melt curves were checked to determine the specificity of the reactions.

Since initial examination of the qRT-PCR results showed that some gene-specific fluorescence thresholds automatically selected by the commercial PCR software were artificially low, resulting in overestimation of Ct values and underestimation of the amount of target RNA (see Supplementary Report: Problems with default PCR quantifications), we quantified the PCR samples with initial concentration estimates using the "window of linearity" method [S1]. This approach provides a simple, well-specific summary of initial amount that is independent of efficiency assumptions.

Supplementary References:

S1. Ruijter JM, Ramakers C, Hoogars WM, Karlen Y, Bakker O, van den Hoff MJ, Moorman AF. Amplification efficiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Res 2009;37:e45.

Supplementary Table 1: Probesets (N=47) and associated genes (N=38) having consistent differences in expression between residual disease (RD) and No-RD patients in the TCGA and Tothill data sets at a 10% false discovery rate in each data set.

Gene Probeset

ADAM12 213790_at

ADH1B 209612_s_at

ADH1B 209613_s_at

ADIPOQ 207175_at

ALDH1A3 203180_at

ALDH5A1 203609_s_at

AQP1 209047_at

BCHE 205433_at

COL11A1 37892_at

COL16A1 204345_at

COL3A1 201852_x_at

COL5A1 203325_s_at

COL6A2 213290_at

COL8A1 214587_at

CRISPLD2 221541_at

CXCL12 203666_at

CXCL12 209687_at

CYR61 201289_at

DCN 201893_x_at

DCN 209335_at

DCN 211813_x_at

DCN 211896_s_at

ETV1 221911_at

FABP4 203980_at

FAP 209955_s_at

GADD45B 207574_s_at

GADD45B 209304_x_at

GADD45B 209305_s_at

GFPT2 205100_at

GREM1 218468_s_at

GREM1 218469_at

KCNE4 222379_at

LUM 201744_s_at

NBL1 201621_at

NBL1 37005_at

NFYA 204107_at

OMD 205907_s_at

PDGFD 219304_s_at

PDLIM3 209621_s_at

PDPN 221898_at

POLR1C 207515_s_at

PTGIS 208131_s_at

SVEP1 213247_at

TIMP3 201150_s_at

VGLL3 220327_at

VSIG4 204787_at

XYLT1 213725_x_at

Gene	Probeset
ADAM12	213790_at
ADH1B	209612_s_at
ADH1B	209613_s_at
ADIPOQ	207175_at
ALDH1A3	203180_at
ALDH5A1	203609_s_at
AQP1	209047_at
BCHE	205433_at
COL11A1	37892_at
COL16A1	204345_at
COL3A1	201852_x_at
COL5A1	203325_s_at
COL6A2	213290_at
COL8A1	214587_at
CRISPLD2	221541_at
CXCL12	203666_at
CXCL12	209687_at
CYR61	201289_at
DCN	201893_x_at
DCN	209335_at
DCN	211813_x_at
DCN	211896_s_at
ETV1	221911_at
FABP4	203980_at
FAP	209955_s_at
GADD45B	207574_s_at
GADD45B	209304_x_at
GADD45B	209305_s_at
GFPT2	205100_at
GREM1	218468_s_at
GREM1	218469_at
KCNE4	222379_at
LUM	201744_s_at
NBL1	201621_at
NBL1	37005_at
NFYA	204107_at
OMD	205907_s_at
PDGFD	219304_s_at
PDLIM3	209621_s_at
PDPN	221898_at
POLR1C	207515_s_at
PTGIS	208131_s_at
SVEP1	213247_at
TIMP3	201150_s_at
VGLL3	220327_at
VSIG4	204787_at
XYLT1	213725_x_at

Supplementary Reports:

Here is a list of the supplementary reports, which are provided in HTML format. These reports were produced using knitr, markdown and RStudio.

Supplementary reports:
Zip files:
- Source files for all reports
- RDataObjects
Our analysis source code relies on a number of software programs and auxiliary packages; we provide scripts, not stand-alone executables. Detailed descriptions of the packages (with version numbers) are listed in the individual reports. The pieces of software required to execute the source code can be obtained from the following locations:
- The main program and several package libraries from the R environment for statistical programming.
- Several package libraries from BioConductor.