by Keith A. Baggerly
We want to produce an RData file with a matrix of RMA expression values for the ovarian cancer samples profiled by Tothill et al with Affymetrix U133+2 arrays.
We acquired a tarball of the 285 gzipped CEL files used from the Gene Expression Omnibus (GEO) page for GSE9891, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9891, on September 13, 2012.
We used justRMA to compute RMA fits, and used our previously assembled clinical information to map the GEO GSM ids to remap the column (sample) names.
We save tothillExpression to the RData file “tothillExpression.RData”.
We first load the libraries we will use in this report.
library(affy)
library(hgu133plus2cdf)
Next, we load our previously assembled clinical information.
load(file.path("RDataObjects", "tothillClinical.RData"))
Here, we specify the location of the data we acquired from GEO on our local system. You will need to acquire these files and adjust this path before running this report yourself.
pathToTothillData <- file.path("RawData", "Tothill", "CEL_Files")
First, we specify the CEL file paths in a character vector for passing to justRMA.
celFileNames <- dir(pathToTothillData, pattern = "^GSM")
celFilePaths <- file.path(pathToTothillData, celFileNames)
Now we use justRMA to summarize expression at the probeset level.
d1 <- date()
tothillExpression <- justRMA(filenames = celFilePaths, compress = TRUE)
tothillExpression <- exprs(tothillExpression)
d2 <- date()
c(d1, d2)
## [1] "Wed Nov 20 11:18:36 2013" "Wed Nov 20 11:22:03 2013"
dim(tothillExpression)
## [1] 54675 285
tothillExpression[1:3, 1:3]
## GSM249714.CEL.gz GSM249715.CEL.gz GSM249716.CEL.gz
## 1007_s_at 10.037 10.591 10.291
## 1053_at 6.808 7.710 6.657
## 117_at 5.804 5.791 5.905
The justRMA computation takes about 4 minutes on my MacBook Pro.
We now use the clinical information to replace the GEO GSM ids with the sample ids in the column names.
tempClinRows <- match(substr(colnames(tothillExpression), 1, 9), as.character(tothillClinical[,
"GEO.ID"]))
tempNames <- rownames(tothillClinical)[tempClinRows]
tothillClinical[tempNames[1:3], ]
## GEO.ID SampleID KMeansGroup ClinicalType HistologicSubtype
## X60120 GSM249714 60120 3 LMP Ser
## X32117 GSM249715 32117 3 LMP Ser
## X23066 GSM249716 23066 3 LMP Ser
## PrimarySite Stage Grade Age Status Pltx Tax Neo MosToRelapse
## X60120 OV II 1 59 PF N N N 37
## X32117 OV II 1 26 PF N N N 8
## X23066 OV III 1 64 PF N N N 18
## MosToDeath ResidDisease ArraySite
## X60120 37 nil OV
## X32117 8 nil OV
## X23066 18 nil OV
colnames(tothillExpression)[1:3]
## [1] "GSM249714.CEL.gz" "GSM249715.CEL.gz" "GSM249716.CEL.gz"
colnames(tothillExpression) <- tempNames
tothillExpression[1:3, 1:3]
## X60120 X32117 X23066
## 1007_s_at 10.037 10.591 10.291
## 1053_at 6.808 7.710 6.657
## 117_at 5.804 5.791 5.905
Now we save the relevant information to an RData object.
save(tothillExpression, file = file.path("RDataObjects", "tothillExpression.RData"))
getwd()
## [1] "/Users/slt/SLT WORKSPACE/EXEMPT/OVARIAN/Ovarian residual disease study 2012/RD manuscript/Web page for paper/Webpage"
sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] hgu133plus2cdf_2.12.0 AnnotationDbi_1.22.6 affy_1.38.1
## [4] Biobase_2.20.1 BiocGenerics_0.6.0 knitr_1.5
##
## loaded via a namespace (and not attached):
## [1] affyio_1.28.0 BiocInstaller_1.10.4 DBI_0.2-7
## [4] evaluate_0.5.1 formatR_0.9 IRanges_1.18.4
## [7] preprocessCore_1.22.0 RSQLite_0.11.4 stats4_3.0.2
## [10] stringr_0.6.2 tools_3.0.2 zlibbioc_1.6.0
[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.