\documentclass[11pt]{article} \usepackage{graphicx} \usepackage{cite} \usepackage{hyperref} \pagestyle{myheadings} \markright{microfluidicsModels08; Revised: 1 Sept 2010} \setlength{\topmargin}{0in} \setlength{\textheight}{8in} \setlength{\textwidth}{6.5in} \setlength{\oddsidemargin}{0in} \setlength{\evensidemargin}{0in} \def\rcode#1{\texttt{#1}} \def\fref#1{\textbf{Figure~\ref{#1}}} \def\tref#1{\textbf{Table~\ref{#1}}} \def\sref#1{\textbf{Section~\ref{#1}}} \title{Filling Out Table 1} \author{Kevin R. Coombes} \date{30 August 2010; REVISED: 1 September 2010} \SweaveOpts{prefix.string=Figures/microfluidicsModels08,eps=FALSE} <>= options(width=88) options(SweaveHooks = list(fig = function() par(bg='white')),eps=FALSE) @ <>= if (!file.exists("Figures")) { dir.create("Figures") } @ \begin{document} \maketitle \tableofcontents \section{Executive Summary} \subsection{Introduction} The main goal of the study is to identify changes in gene expression in CLL that are associated with clinical outcome (including overall survival and time-to-treatment). \subsubsection{Aims/Objectives} In this report, we want to see if there are any differences in clinical parameters between the training datset and the valuidation dataset. \subsection{(Statistical) Methods} Exploratory data analyses and summaries. Fisher's Exact test. Two-sample t-tests. \subsection{Results} \begin{itemize} \item Only the (categorical) white blood cell count shows a statistical difference between training and validation. \end{itemize} \subsection{Conclusion} It is time to write the manuscript and submit it.... \section{Loading the Data} In the previous report (\rcode{microfluidicsModels06.pdf}), we stored all of the relevant data in a binary R file. We begin by loading that data. <>= load("withmod02.rda") ls() @ \subsection{Functions} We use these functions repeatedly in oprder to simplify the code to sumamrize the different clinical parameters inn the training and validation datasets. <>= foo <- function(column) { train <- cardAB.clinical[, column] valid <- cardC.clinical[, column] list(train=train, valid=valid) } blark <- function(stuff) { cat("Train\n") print(summary(stuff$train)) cat("\nValid\n") print(summary(stuff$valid)) if (inherits(stuff$train, "factor")) { f1 <- data.frame(Overall=stuff$train, type="Train") f2 <- data.frame(Overall=stuff$valid, type="Valid") print(summary(rbind(f1, f2))) x <- c(stuff$train, stuff$valid) y <- rep(c("Train", "Valid"), times=c(length(stuff$train), length(stuff$valid))) tab <- table(x, y) te <- fisher.test(tab) } else { cat("Overall\n") print(summary(c(stuff$train, stuff$valid))) te <- t.test(stuff$train, stuff$valid) } print(te) } @ \section{Summary Information} \subsection{Age at Diagnosis} <>= age <- foo("AgeAtDx") blark(age) @ \subsection{Gender} <>= sex <- foo("Sex") blark(sex) @ \subsection{Rai Stage} <>= rai <- foo("CatRAI") blark(rai) @ \subsection{White blood cell count} <>= wbc <- foo("CatWBC") blark(wbc) @ \subsection{Platelets} <>= plt <- foo("Platelets") blark(plt) @ \subsection{Serum Beta-2 microglobulin} <>= b2m <- foo("CatB2M") blark(b2m) @ \subsection{Serum lactate dehydrogenase} <>= ldh <- foo("Serum.LDH") blark(ldh) @ After log transformation: <>= logldh <- foo("LogLDH") blark(logldh) @ \subsection{Mutation Status} <>= mut <- foo("mutation.status") blark(mut) @ \subsection{Light chain subtype} <>= ltch <- foo("Light.chain.subtype") blark(ltch) @ \subsection{Zap-70 Protein} <>= zap <- foo("Zap70Protein") blark(zap) @ \subsection{Matutes Score} <>= matu <- foo("Matutes") blark(matu) @ \subsection{CD38} <>= cd38 <- foo("CatCD38") blark(cd38) @ \subsection{Cytogenetic complexity} <>= cyto <- foo("CatCyto") blark(cyto) @ \subsection{Splenomegaly} <>= spl <- foo("Massive.Splenomegaly") blark(spl) @ \subsection{hemoglobin} <>= hem <- foo("Hemoglobin") blark(hem) @ \subsection{Prolymphocytes} <>= pro <- foo("Prolymphocytes") blark(pro) @ \subsection{Hypogammaglobulinemia} <>= hgg <- foo("Hypogammaglobulinemia") blark(hgg) @ \section{Time-to-treatment} First, we get median time-to-treatment in the training data. <<>>= library(survival) @ <>= survfit(Surv(TimeDiagnosis2SigTreat, NumericSigTreatment) ~ 1, data=cardAB.clinical) @ Now we do the asme thing for the validation dataset. <>= survfit(Surv(TimeDiagnosis2SigTreat, NumericSigTreatment) ~ 1, data=cardC.clinical) @ Finally, we test whether there is any difference between the training and validation datasets. <>= full.clinical <- rbind(cardAB.clinical, cardC.clinical) full.clinical$Use <- factor(rep(c("Train", "Valid"), times=c(nrow(cardAB.clinical), nrow(cardC.clinical)))) @ <>= model <- coxph(Surv(TimeDiagnosis2SigTreat, NumericSigTreatment) ~ Use, data=full.clinical) summary(model) @ \begin{figure} <>= colset <- c("red", "blue") plot( survfit(Surv(TimeDiagnosis2SigTreat, NumericSigTreatment) ~ Use, data=full.clinical), col=colset, lwd=2, xlab="Time (months)", ylab="Fraction untreated") legend("topright", levels(full.clinical$Use), col=colset, lwd=2) @ \caption{Kaplan-Meier plot of time from diagnosis to treatment in the training and validation subsets.} \label{km} \end{figure} \section{Appendix} <>= save(full.clinical, file='full-clinical.rda') @ This analysis was run in the following directory: <>= getwd() @ <>= sessionInfo() @ \end{document}