% % Lectures for Microarray Course % % This is the template file for all lectures to ensure consistent % layout, fonts, and colors. % % Remember to update the hypersetup to include individual lecture % titles, change the leftheader to include the lecture title, and % then start at \begin{document} .... % Also, produce a printable (black and white version) according % to the cues immediately following \begin{document}. % % Author: Keith Baggerly and Kevin R. Coombes % Date : 28 September 2006 % % Compile this document with: % pdflatex .tex % and then: % ppower4pb .pdf .pdf % \documentclass[Screen4to3,25pt,headrule,footrule]{foils} \usepackage[pdftex]{color} %for colors \definecolor{Teal}{rgb}{0.0,0.47,0.46} \definecolor{PaleTeal}{rgb}{0.0,0.8,0.6} \definecolor{DarkTeal}{rgb}{0.0,0.2157,0.2118} \definecolor{Lemon}{rgb}{1.0,1.0,0.4} \definecolor{Orange}{rgb}{1.0,0.6,0.0} \definecolor{PaleBlue}{rgb}{0.0,0.8,1.0} \definecolor{Gray7}{rgb}{0.7,0.7,0.7} \definecolor{DarkBurgundy}{rgb}{0.2745,0.0,0.0} \definecolor{Burgundy}{rgb}{0.62,0.06,0.41} \definecolor{DarkGreen}{rgb}{0.5,0.8,0.6} \definecolor{RGBblack}{rgb}{0.0,0.0,0.0} %black doesn't work (?) for % {h,v}pagecolor so use this \usepackage[pdftex]{graphicx} %for importing graphics \usepackage{times} %for nice fonts \usepackage[pdftex]{hyperref} %for hyperlinking and other effects \hypersetup{ pdftitle={Lecture 10: Exploring BioConductor}, pdfsubject={Analysis of Microarray Data}, pdfauthor={Keith A. Baggerly and Kevin R. Coombes, Department of Biostatistics and Applied Mathematics, UT M.D. Anderson Cancer Center, , }, pdfkeywords={microarray,R}, pdfpagemode={None}, linkcolor=PaleBlue, citecolor=Gray7, pagecolor=PaleBlue, urlcolor=PaleBlue } \usepackage{pause} %required for ppower4 \usepackage{background} %required for ppower4 \usepackage{pp4slide} %required for ppower4 \usepackage{pp4link} \usepackage[pdftex]{geometry} %for resizing the header areas \geometry{hscale=0.85, %make the slides 85% of horizontal vscale=0.8 % and 80% of vertical } \setcounter{page}{0} %don't count the title page \setlength{\foilheadskip}{-0.5in} %skip less space below the title %\setlength{\itemsep}{1pt} %set distance between list items % doesn't seem to do anything % use different colored bullets in nested lists % color choices come from MDACC graphics standards \renewcommand{\labelitemi}{\textcolor{PaleTeal}{$\bullet$}} \renewcommand{\labelitemii}{\textcolor{Orange}{$\bullet$}} \renewcommand{\labelitemiii}{\textcolor{PaleBlue}{$\bullet$}} \renewcommand{\labelitemiv}{\textcolor{Lemon}{$\bullet$}} % change color of slide titles %\renewcommand\normalcolor{\color{black}} %\def\Black#1{\color{black}#1} % change the default background, again an MDACC standard \vpagecolor[RGBblack]{Teal} \raggedright %no right-justify % put lecture specific items in the header and consistent course % items in the footer \leftheader{\hyperlink{start}{\textsc{Differential Expression}}} \rightheader{\textsf{\thepage}} \MyLogo{{\copyright\ Copyright 2004--2007, Kevin R.~Coombes and Keith A.~Baggerly}} % left footer \rightfooter{\textsc{GS01 0163: Analysis of Microarray Data}} \begin{document} \hypersetup{pdfpagetransition=Replace} %\color{black} % for black-and-white version % Also, run M-x replace-string \textcolor{white} \textcolor{black}. % Then, using pdflatex only (no ppower4 step) should produce the % black and white version with pauses shown as colored boxes. % Remember to run M-x replace-string \textcolor{black} \textcolor{white} % afterwards to reset! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Title Page \title{\textcolor{Lemon}{GS01 0163\\ Analysis of Microarray Data}} \author{\textcolor{white}{Keith Baggerly and Kevin Coombes}\\ \textcolor{white}{Department of Bioinformatics and Computational Biology}\\ \textcolor{white}{UT M. D. Anderson Cancer Center}\\ \textcolor{Orange}{\tt kabagg@mdanderson.org}\\ \textcolor{Orange}{\tt kcoombes@mdanderson.org}} \date{\textcolor{Lemon}{2 October 2007}} \LogoOff \maketitle \def\rcode#1{\texttt{#1}} <>= options(width=60) options(SweaveHooks=list(fig=function() par(bg="white"))) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Individual Lecture Title Page \foilhead{\hypertarget{start}{Lecture 10}: Exploring BioConductor} \LogoOn % In most cases, this slide should be an outline of topics to be % covered in this lecture, presented as an itemized list. Each item in % the list should provide a hyperlink to the slide where discussion of % that topic begins. %\begin{itemize} % \item \hyperlink{Student}{Student's t-test} % \item \hyperlink{sims}{Simulating nothing} % \item \hyperlink{FWER}{Family-wise error rate} % \item \hyperlink{perm}{Permutation tests} % \item \hyperlink{conser}{Is FWER too conservative?} % \item \hyperlink{BUM}{Beta-uniform mixture model} %\end{itemize} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\foilhead{Let's Explore Some Data} \begin{itemize} \item How do we load CEL files into an AffyBatch? how can we merge batches? how can we partition batches? \item How do we check that it worked? \item How do we supply the associated phenoData? \item Given an AffyBatch, how do we look at it? boxplot, hist, ma-plots, ratio plots, PLM \item Given an AffyBatch, how do we fit it? expresso, justRMA \item Given an eset, what can we say about its contents? \item How can we get the probe level values for a probeset? \item How can we figure out what probeset corresponds to a given gene? \item How can we get the probe sequences for a probeset? \end{itemize} % how can we access PubMed to find out more about that % gene? %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Loading CEL files into an AffyBatch} One theme for today is TMTOWTDI. We're going to try to extend this from Perl over to BioConductor, and to the analysis of Affy data. To begin with, let's say that we've got a set of CEL files. How do we pull them in? There are several options. How do I survey them? \pause <>= library(affy) vignette("affy") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Reading the Fine Manual: Vignettes} \begin{center} \includegraphics[height=6in]{Figs/vignette_shot} \end{center} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Listing Vignettes} \begin{center} \includegraphics[height=5in]{Figs/vignette_list_shot} \end{center} \begin{verbatim} > vignette(package = "affy"); \end{verbatim} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{ReadAffy: Help from Top} \begin{center} \includegraphics[height=6in]{Figs/readaffy_top_shot} \end{center} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{ReadAffy: ... to Bottom} \begin{center} \includegraphics[height=6in]{Figs/readaffy_bottom_shot} \end{center} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{The Affy Index} \begin{center} \includegraphics[height=6in]{Figs/affy_index_shot} \end{center} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Reading a list of files} The file \rcode{subcels.txt} contains $20$ lines like this: \begin{verbatim} G:/Public/Singh-Prostate-Affymetrix/CelFiles/N01__normal.cel G:/Public/Singh-Prostate-Affymetrix/CelFiles/N58__normal.cel G:/Public/Singh-Prostate-Affymetrix/CelFiles/N59__normal.cel G:/Public/Singh-Prostate-Affymetrix/CelFiles/N61__normal.cel G:/Public/Singh-Prostate-Affymetrix/CelFiles/N62__normal.cel ... \end{verbatim} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Reading a list of files} <>= basedir <- file.path("G:", "Public", "Singh-Prostate-Affymetrix") celList <- read.table(file.path(basedir, "subcels.txt")) celList[1:6,] @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Reading the CEL files} <>= ABatch <- ReadAffy(celList) @ <>= a <- try(ABatch <- ReadAffy(celList)) if (inherits(a, "try-error")) writeLines(strwrap(a)) @ \pause oops... %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{The Evolution...} <>= celList <- as.character(celList$V1) #$ @ <>= <> @ <>= <> @ \pause oops... <>= ABatch <- ReadAffy(filenames = celList); @ <>= if (!exists("ABatch")) { if (file.exists("abatch.Rda")) { load("abatch.Rda") } else { ABatch <- ReadAffy(filenames=celList) save(ABatch, file="abatch.Rda") } } @ Ta Da! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Checking the Contents} <>= slotNames(ABatch) phenoData(ABatch) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Looking at phenoData} <>= class(phenoData(ABatch)) slotNames(phenoData(ABatch)) pd <- phenoData(ABatch) @ \foilhead{} <<>>= pd@data pd@varMetadata pd@dimLabels @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Assigning phenoData} subsamples.txt: \begin{verbatim} Array name Sample name Status Batch Cluster N01__normal N01A Normal B2 A N58__normal N58A Normal B4 A N59__normal N59A Normal B3 A N61__normal N61A Normal B3 A N62__normal N62A Normal B3 A N02__normal N02B Normal B2 B N11__normal N11B Normal B2 B N18__normal N18B Normal B2 B ... \end{verbatim} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Assigning phenoData} <>= p1 <- read.phenoData(file.path(basedir, "subsamples.txt")) # error @ <>= a <- try(read.phenoData(file.path(basedir, "subsamples.txt"))) if (inherits(a, "try-error")) writeLines(strwrap(a)) @ % I cannot convince Sweave to print the warning.... \begin{verbatim} In addition: Warning message: read.phenoData is deprecated, use read.AnnotatedDataFrame instead \end{verbatim} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Assigning phenoData, pt 2} <>= p1 <- read.AnnotatedDataFrame(file.path(basedir, "subsamples.txt"), sep="\t") p1 @ Not quite what we want. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Assigning phenoData, pt 3} <>= p1 <- read.AnnotatedDataFrame(file.path(basedir, "subsamples.txt"), sep="\t", header=TRUE) p1 @ <>= phenoData(ABatch) <- p1 @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Other ways of Reading Data} Are they all in one directory? What is the list of filenames? read.affybatch vs ReadAffy GUI? %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Other ways of Reading Data 1} \begin{verbatim} kabagg$ ls ../../DataSets/SinghSmall N60__normal.CEL N61__normal.CEL N62__normal.CEL \end{verbatim} \begin{verbatim} > ABSmall <- ReadAffy(celfile.path= "../../DataSets/SinghSmall"); # works \end{verbatim} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Other ways of Reading Data 2} \begin{verbatim} kabagg$ ls ../../DataSets/SinghSmall2 N60__normal.CEL.gz N61__normal.CEL.gz N62__normal.CEL.gz \end{verbatim} \begin{verbatim} > ABSmall <- ReadAffy(celfile.path= "../../DataSets/SinghSmall2", compress=TRUE); # works \end{verbatim} This takes only about 1/3 the space... %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Other ways of Reading Data 3} \begin{verbatim} kabagg$ ls ../../DataSets/SinghSmall3 N60.gz N61.gz N62.gz \end{verbatim} \begin{verbatim} > ABSmall <- ReadAffy(celfile.path= "../../DataSets/SinghSmall3", compress=TRUE); # fails \end{verbatim} \begin{verbatim} > ABSmall <- ReadAffy(filenames= "../../DataSets/SinghSmall3/N60.gz", compress=TRUE); # works \end{verbatim} %$ This still takes only about 1/3 the space... %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Quantification} <>= t0 <- date() eset0 <- expresso(ABatch, bgcorrect.method="rma", normalize.method="quantiles", pmcorrect.method="pmonly", summary.method="medianpolish") t1 <- date() # 151s t0 t1 @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Quantification} <>= eset1 <- justRMA(filenames = celList) @ <>= a <- try(eset1 <- justRMA(filenames = celList)) writeLines(strwrap(a)) @ <>= eset1 <- justRMA(filenames = celList, celfile.path="") @ <>= a <- try(eset1 <- justRMA(filenames = celList, celfile.path="")) writeLines(strwrap(a)) @ <>= t2 = date() eset1 <- justRMA(filenames = celList, celfile.path=NULL) t3 = date() t2 t3 @ The following method will also work: <>= celpath <- file.path(basedir, "CelFiles") cels <- sub(celpath, "", celList) eset1 <- justRMA(filenames = cels, celfile.path=celpath) t2 <- date() # 10s @ The customized routines are better if they do what you want to do... (also note that justRMA didn't build an AffyBatch.) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Just Because I'm Curious} <>= exprs(eset1)[1,1:3] @ Can we reconstruct this? <>= ABatch.BG <- bg.correct.rma(ABatch) ABatch.BG.norm <- normalize.AffyBatch.quantiles(ABatch.BG) @ These steps produce AffyBatch objects, with altered exprs. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{What is the First Gene?} (well, ok, probeset) <>= gn1 <- geneNames(ABatch.BG.norm)[1] gn1 @ Ok, now what are the values? <>= pr1 <- pm(ABatch.BG.norm, gn1); @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Looking at it, Take 1} \begin{center} <>= image(1:nrow(pr1), 1:ncol(pr1), pr1, xlab="Probes", ylab="Samples", main=paste("PM Intensities for", gn1)) @ \end{center} Some parallelism, but we may be missing something... %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Looking at it, Take 2} \begin{center} <>= image(1:nrow(pr1), 1:ncol(pr1), log2(pr1), xlab="Probes", ylab="Samples", main=paste("PM Intensities for", gn1)) @ \end{center} Logs! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Fitting the Probes} <>= pr1Fit <- medpolish(log2(pr1)) names(pr1Fit) (pr1Fit$overall + pr1Fit$col)[1:3] @ This is what we found before! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{We can Check the Code} <>= medpolish @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{and Check the Residuals} \begin{center} <>= image(1:nrow(pr1), 1:ncol(pr1), pr1Fit$residuals, #$ xlab="Probes", ylab="Samples", main=paste("PM Intensities for", gn1)) @ \end{center} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{One other Fitting Approach: PLM} PLM = ``Probe Level Model'' <>= library(affyPLM); plm1 <- fitPLM(ABatch); # takes a few minutes @ <>= opar <- par(mfrow=c(2,2)); image(ABatch[, 1],main="N01 Raw"); image(plm1, type="weights", which=1, main="N01 Weights"); image(plm1, type="resids", which=1, main="N01 Resids"); image(plm1, type="sign.resids", which=1, main="N01 sign(Resids)"); par(opar) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Looking at N01} \begin{center} \includegraphics[height=6in]{Figs/plm1_good} \end{center} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Looking at N05} \begin{center} \includegraphics[height=6in]{Figs/plm5_bad} \end{center} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Whence the Gene Name Info?} \begin{center} \includegraphics[height=5in]{Figs/hgu95av2_envs} \end{center} <>= library("hgu95av2"); @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{What Does This Package Contain?} <<>>= hgu95av2() @ (we can also see this using ls("package:hgu95av2").) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{What Does This Package Contain?} <>= hgu95av2GENENAME @ Almost everything in this package is an ``environment'', which is the fancy name R uses for a hash table. We can access things by name. <<1000>>= hgu95av2GENENAME$"1000_at" #$ @ We can access a lot of annotation! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{What was Needed for Quantification?} <>= library("hgu95av2cdf"); hgu95av2cdf$"1000_at"#$ @ These give the indices of the probes within the 409600-long vector of expression intensities. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{What if We Want to Go in Reverse?} Given a probeset, I can find a gene name. What if I have a gene name, and I want something else? Can we find ``BAD''? This is a gene symbol, so we probably want to work with the hgu95av2SYMBOL environment. The key function for extracting items from an environment without the key is ``contents''. <>= tempSYM <- contents(hgu95av2SYMBOL); tempSYM[1] @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{Finding BAD in the Contents} <>= tempSYM[tempSYM == "BAD"] names(tempSYM[tempSYM == "BAD"]) @ This gives us the key! Some of these queries are simplified if we invoke <>= library("annotate"); # for example, getLL("1861_at", "hgu95av2"); @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{One More Thing...} sequences? <>= library("hgu95av2probe"); data(hgu95av2probe); # a big file as.data.frame(hgu95av2probe[1,]) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{So, What is BAD?} <>= as.data.frame(hgu95av2probe[hgu95av2probe$Probe.Set.Name == "1861_at",]) #$ @ \end{document}