Assembling an RMA Quantification Matrix for the Tothill Ovarian Data ==================================================================== by Keith A. Baggerly ## 1 Executive Summary ### 1.1 Introduction We want to produce an RData file with a matrix of RMA expression values for the ovarian cancer samples profiled by [Tothill et al](#tothill08) with Affymetrix U133+2 arrays. ### 1.2 Methods We acquired a tarball of the 285 gzipped CEL files used from the Gene Expression Omnibus (GEO) page for GSE9891, [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9891](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9891), on September 13, 2012. We used justRMA to compute RMA fits, and used our previously assembled clinical information to map the GEO GSM ids to remap the column (sample) names. ### 1.3 Results We save tothillExpression to the RData file "tothillExpression.RData". ## 2 Libraries We first load the libraries we will use in this report. ```{r libraries, message=FALSE} library(affy) library(hgu133plus2cdf) ``` ## 3 Loading Clinical Information Next, we load our previously assembled clinical information. ```{r loadTothillClinical} load(file.path("RDataObjects","tothillClinical.RData")) ``` ## 4 Specifying the Raw Data Location Here, we specify the location of the data we acquired from GEO on our local system. You will need to acquire these files and adjust this path before running this report yourself. ```{r pathToTothillData} pathToTothillData <- file.path("RawData","Tothill","CEL_Files") ``` ## 5 Quantifying The CEL Files First, we specify the CEL file paths in a character vector for passing to justRMA. ```{r celFilePaths} celFileNames <- dir(pathToTothillData,pattern="^GSM") celFilePaths <- file.path(pathToTothillData,celFileNames) ``` Now we use justRMA to summarize expression at the probeset level. ```{r fitRMA, message=FALSE} d1 <- date() tothillExpression <- justRMA(filenames=celFilePaths,compress=TRUE) tothillExpression <- exprs(tothillExpression) d2 <- date() c(d1,d2) dim(tothillExpression) tothillExpression[1:3,1:3] ``` The justRMA computation takes about 4 minutes on my MacBook Pro. ## 6 Mapping CEL Names to Sample IDs We now use the clinical information to replace the GEO GSM ids with the sample ids in the column names. ```{r remapColumnNames} tempClinRows <- match(substr(colnames(tothillExpression),1,9), as.character(tothillClinical[,"GEO.ID"])) tempNames <- rownames(tothillClinical)[tempClinRows] tothillClinical[tempNames[1:3],] colnames(tothillExpression)[1:3] colnames(tothillExpression) <- tempNames tothillExpression[1:3,1:3] ``` ## 7 Saving RData Now we save the relevant information to an RData object. ```{r saveTothillExpression} save(tothillExpression, file=file.path("RDataObjects","tothillExpression.RData")) ``` ## 8 Appendix ### 8.1 File Location ```{r getLocation} getwd() ``` ### 8.2 SessionInfo ```{r sessionInfo} sessionInfo(); ``` ## 9 References >

[1] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res, 14(16):5198-208, 2008.