\documentclass[11pt]{article} \usepackage{graphicx} \usepackage{cite} \usepackage{hyperref} \pagestyle{myheadings} \markright{microfluidicsModels03; revised: 2 Aug 2010} \setlength{\topmargin}{0in} \setlength{\textheight}{8in} \setlength{\textwidth}{6.5in} \setlength{\oddsidemargin}{0in} \setlength{\evensidemargin}{0in} \def\rcode#1{\texttt{#1}} \def\fref#1{Figure~\ref{#1}} \def\tref#1{Table~\ref{#1}} \title{Residual Plots for Gene Predictors of Survival and Time to Treatment in CLL} \author{Kevin R. Coombes} \date{12 July 2010; REVISED: 2 Aug 2010} \SweaveOpts{prefix.string=Figures/microfluidicsModels03,eps=FALSE} <>= options(width=92) options(SweaveHooks = list(fig = function() par(bg='white')),eps=FALSE) @ <>= if (!file.exists("Figures")) { dir.create("Figures") } @ \begin{document} \maketitle \tableofcontents \section{Executive Summary} \subsection{Introduction} The main goal of the study is to identify changes in gene expression in CLL that are associated with clinical outcome (including overall survival and time-to-treatment). \subsubsection{Aims/Objectives} We want to find individual genes and combinations of genes that are related to prognosis in CLL. Here ``prognosis'' refers to the ability to predict time-to-treatment or overall suvival. As a prelude, we plot residuals from the univariate predictive models, comparing against both a null model and against the best purely clinical models produced inthe last report. \subsection{(Statistical) Methods} Exploratory data analyses and summaries. Cox proportional hazards models. Stepwise selection of predictive variables using the Akaike Information Criterion. Bayesian model averaging. \subsection{Results} \begin{itemize} \item Lots of residual plots. \end{itemize} \subsection{Conclusion} Ready to actually construct univariate models. \section{Loading the Data} We met and agreed that time-to-event analyses should be performed with two different starting points: \begin{itemize} \item Diagnosis \item Sample collection \end{itemize} and two different end points: \begin{itemize} \item First significant treatment (defined below) \item Death \end{itemize} In the previous report (\rcode{microfluidicsModels.pdf}), we found that about $60\%$ of the samples in the validation (Card~C) dataset were treated within two days of sample acguisition. However, many of these treatments were simple rituximab-based treatments that we are calling ``not significant''. As a consequence, we have dropped the endpoint of ``first treatment of any kind'' from the analysis. We also stored all of the relevant data in a binary R file. We begin by loading that data. <>= load("micro02.rda") keepers <- c("cardAB.clinical", "cardAB.data", "cardAB.predictors", "cardC.clinical", "cardC.data", "sam2sig.model", "dx2sig.model", "sam2os.model", "dx2os.model", "onC") listing <- ls() rm(list=listing[!(listing %in% keepers)]) rm(listing) ls() save.image("micro03.rda") @ \subsection{Preliminaries} As mentioned above, we want to perform a variety of time-to-event (``survival'') analyses,with different starting and ending points. The basic code for these analyses is in the following libraries. <>= library(survival) library(BMA) @ Here we construct directories to hold all the figures. <>= for (d in c("NullBase", "ClinBase")) { if (!file.exists(file.path("Figures", d))) { dir.create(file.path("Figures", d)) } if (!file.exists(file.path("Figures", d, "Dx2Sig"))) { dir.create(file.path("Figures", d, "Dx2Sig")) } if (!file.exists(file.path("Figures", d, "Dx2OS"))) { dir.create(file.path("Figures", d, "Dx2OS")) } if (!file.exists(file.path("Figures", d, "Sam2Sig"))) { dir.create(file.path("Figures", d, "Sam2Sig")) } if (!file.exists(file.path("Figures", d, "Sam2OS"))) { dir.create(file.path("Figures", d, "Sam2OS")) } } @ \section{Residual Plots, Diagnosis to Treatment} We define the variables we need for constructing and storing the figures. <>= subdir <- file.path("Figures", "NullBase", "Dx2Sig") clindir <- file.path("Figures", "ClinBase", "Dx2Sig") timecol <- "TimeDiagnosis2SigTreat" statcol <- "NumericSigTreatment" clinmod <- dx2sig.model @ \subsection{Null base model} This loop creates a separate set of figures for each gene. <>= myform <- formula(paste("Surv(",timecol, ",", statcol, ") ~ 1")) nullmod <- coxph(myform, cardAB.clinical) nullmart <- residuals(nullmod, type="martingale") for (i in 1:ncol(cardAB.data)) { mygene <- colnames(cardAB.data)[i] X <- cardAB.data[,mygene] myform <- formula(paste("Surv(",timecol, ",", statcol, ") ~", mygene)) model <- coxph(myform, data=data.frame(cardAB.clinical, cardAB.data)) Y <- coef(model)*X + nullmart cz <- cox.zph(model) par(mfrow=c(2,2), bg='white') plot(nullmart, X, pch=16, ylab=paste("Cycles of", mygene), xlab="Null Martingale Residuals", main="Test of Functional Form") abline(h=median(X)) lines(lowess(nullmart, X), col='red', lwd=2) plot(X, Y, pch=16, main="Test of Linearity", ylab="component + residual", xlab=mygene) abline(coef(lm(Y ~ X))) lines(lowess(X, Y), col='red', lwd=2) plot(dfb <- resid(model, type='dfbeta'), pch=16, ylab=paste("dfbeta residuals of",mygene), main="Test for Influential Samples") abline(h=0) abline(h=c(-2, -1, 1, 2)*coef(model), col='blue') plot(cz, main="Test of Proportional Hazards") abline(h=0, col="#666666") abline(h=coef(model), col='blue', lwd=2) par(mfrow=c(1,1)) dev.copy(png, file.path(subdir, paste(mygene, "png", sep='.')), width=1200, height=1200, bg='white') dev.off() } @ \subsection{Clinical base model} We use a similar loop to create figures for each gene, using the best clinical model as the base instead of the null model. <>= clinform <- formula(clinmod) nullmod <- coxph(clinform, cardAB.clinical) nullmart <- residuals(nullmod, type="martingale") for (i in 1:ncol(cardAB.data)) { mygene <- colnames(cardAB.data)[i] X <- cardAB.data[names(nullmart),mygene] myform <- update(clinform, paste("~ . + ", mygene)) model <- coxph(myform, data=data.frame(cardAB.clinical, cardAB.data)) Y <- coef(model)[mygene]*X + nullmart cz <- cox.zph(model) par(mfrow=c(2,2), bg='white') plot(nullmart, X, pch=16, ylab=paste("Cycles of", mygene), xlab="Clinical Martingale Residuals", main="Test of Functional Form") abline(h=median(X)) lines(lowess(nullmart, X), col='red', lwd=2) plot(X, Y, pch=16, main="Test of Linearity", ylab="component + residual", xlab=mygene) abline(coef(lm(Y ~ X))) lines(lowess(X, Y), col='red', lwd=2) plot(dfb <- resid(model, type='dfbeta'), pch=16, ylab=paste("dfbeta residuals of",mygene), main="Test for Influential Samples") abline(h=0) abline(h=c(-2, -1, 1, 2)*coef(model)[mygene], col='blue') plot(cz, var=mygene, main="Test of Proportional Hazards") abline(h=0, col="#666666") abline(h=coef(model)[mygene], col='blue', lwd=2) par(mfrow=c(1,1)) dev.copy(png, file.path(clindir, paste(mygene, "png", sep='.')), width=1200, height=1200, bg='white') dev.off() } @ \section{Residual Plots, Sample to Treatment} By updating the definitions, we can reuse the same code to create figures for the time from sample to treatment. <>= subdir <- file.path("Figures", "NullBase", "Sam2Sig") clindir <- file.path("Figures", "ClinBase", "Sam2Sig") clinmod <- sam2sig.model timecol <- "TimeSample2SigTreat" statcol <- "NumericSigTreatment" <> <> @ \section{Residual Plots, Diagnosis and Overall Survival} By updating the definitions, we can reuse the same code to create figures for the time from diagnosis for overall survival. <>= subdir <- file.path("Figures", "NullBase", "Dx2OS") clindir <- file.path("Figures", "ClinBase", "Dx2OS") clinmod <- dx2os.model timecol <- "OSAfterDiagnosis" statcol <- "NumericVitalStatus" <> <> @ \section{Residual Plots, Sample and Overall Survival} By updating the definitions, we can reuse the same code to create figures for the time from sample for overall survival. <>= subdir <- file.path("Figures", "NullBase", "Sam2OS") clindir <- file.path("Figures", "ClinBase", "Sam2OS") clinmod <- sam2os.model timecol <- "OSAfterSample" statcol <- "NumericVitalStatus" <> <> @ \section{Appendix} This analysis was run in the following directory: <>= getwd() @ <>= sessionInfo() @ \end{document}