Assembling an RMA Quantification Matrix for the Bonome Ovarian Data

by Keith A. Baggerly

1 Executive Summary

1.1 Introduction

We want to produce an RData file with a matrix of RMA expression values for the ovarian cancer samples profiled by Bonome et al. with Affymetrix U133A arrays.

1.2 Methods

We acquired a tarball of the 195 gzipped CEL files (185 tumor samples and 10 normal ovarian samples) used from the Gene Expression Omnibus (GEO) page for GSE26712, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26712, on Sep 10, 2012.

We used justRMA to compute RMA fits, and used our previously assembled clinical information to map the GEO GSM ids to remap the column (sample) names.

1.3 Results

We save bonomeExpression to the RData file “bonomeExpression.RData”.

2 Libraries

We first load the libraries we will use in this report.


library(affy)
library(hgu133acdf)

3 Loading Clinical Information

Next, we load our previously assembled clinical information.


load(file.path("RDataObjects", "bonomeClinical.RData"))

4 Specifying the Raw Data Location

Here, we specify the location of the data we acquired from GEO on our local system. You will need to acquire these files and adjust this path before running this report yourself.


pathToBonomeData <- file.path("RawData", "Bonome", "CEL_Files")

5 Quantifying The CEL Files

First, we specify the CEL file paths in a character vector for passing to justRMA.


celFileNames <- dir(pathToBonomeData, pattern = "^GSM")
celFilePaths <- file.path(pathToBonomeData, celFileNames)

Now we use justRMA to summarize expression at the probeset level.


d1 <- date()
bonomeExpression <- justRMA(filenames = celFilePaths, compress = TRUE)
bonomeExpression <- exprs(bonomeExpression)
d2 <- date()
c(d1, d2)
## [1] "Wed Jun 12 14:35:55 2013" "Wed Jun 12 14:38:55 2013"

dim(bonomeExpression)
## [1] 22283   195
bonomeExpression[1:3, 1:3]
##           GSM657519_HOSE2237.CEL.gz GSM657520_HOSE2008.CEL.gz
## 1007_s_at                     8.693                     8.425
## 1053_at                       5.100                     5.071
## 117_at                        5.084                     5.868
##           GSM657521_HOSE2061.CEL.gz
## 1007_s_at                     8.570
## 1053_at                       5.099
## 117_at                        5.238

The justRMA computation takes about 2 minutes on my MacBook Pro.

6 Mapping CEL Names to Sample IDs

We now use the clinical information to replace the GEO GSM ids with the sample ids in the column names.


tempClinRows <- match(substr(colnames(bonomeExpression), 1, 9), as.character(bonomeClinical[, 
    "GEO.ID"]))
tempNames <- rownames(bonomeClinical)[tempClinRows]
bonomeClinical[tempNames[1:3], ]
##             GEO.ID SampleID SurgeryOutcome Status SurvivalYears
## HOSE2237 GSM657519 HOSE2237                                  NA
## HOSE2008 GSM657520 HOSE2008                                  NA
## HOSE2061 GSM657521 HOSE2061                                  NA
colnames(bonomeExpression)[1:3]
## [1] "GSM657519_HOSE2237.CEL.gz" "GSM657520_HOSE2008.CEL.gz"
## [3] "GSM657521_HOSE2061.CEL.gz"
colnames(bonomeExpression) <- tempNames
bonomeExpression[1:3, 1:3]
##           HOSE2237 HOSE2008 HOSE2061
## 1007_s_at    8.693    8.425    8.570
## 1053_at      5.100    5.071    5.099
## 117_at       5.084    5.868    5.238

7 Saving RData

Now we save the relevant information to an RData object.


save(bonomeExpression, file = file.path("RDataObjects", "bonomeExpression.RData"))

8 Appendix

8.1 File Location


getwd()
## [1] "\\\\mdadqsfs02/workspace/kabagg/RDPaper/Webpage/ResidualDisease"

8.2 SessionInfo


sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] hgu133acdf_2.11.0    AnnotationDbi_1.20.7 affy_1.36.1         
## [4] Biobase_2.18.0       BiocGenerics_0.4.0   knitr_1.2           
## 
## loaded via a namespace (and not attached):
##  [1] affyio_1.26.0         BiocInstaller_1.8.3   DBI_0.2-7            
##  [4] digest_0.6.3          evaluate_0.4.3        formatR_0.7          
##  [7] IRanges_1.16.6        parallel_2.15.3       preprocessCore_1.20.0
## [10] RSQLite_0.11.4        stats4_2.15.3         stringr_0.6.2        
## [13] tools_2.15.3          zlibbioc_1.4.0

9 References

[1] Bonome T, Levine DA, Shih J, Randonovich M, Pise-Masison CA, Bogomolniy F, Ozbun L, Brady J, Barrett JC, Boyd J, Birrer MJ. A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer. Cancer Res, 68(13):5478-86, 2008.