Flagging Genes (Probesets) Associated with RD in Both TCGA and Tothill
======================================================================
by Keith A. Baggerly
## 1 Executive Summary
### 1.1 Introduction
We want to identify genes whose expression shows
a strong and similarly directed association with
residual disease (RD) status in both the TCGA and Tothill datasets.
### 1.2 Methods
We load our previously assembled RData files for
tcgaExpression,
tcgaFilteredData,
tothillExpression,
tothillFilteredData,
and
tothillClinical.
We restrict our attention to probesets on both
the TCGA and Tothill array platforms.
Then, using just the filtered sets of samples,
we contrast RD and No RD samples within each
dataset using two sample t-tests. For each
probeset in each dataset, we identify the
associated gene and record the mean
expression level in the RD and No RD groups,
the t-statistic, the raw p-value, and the false discovery rate
(FDR) adjusted p-value.
We flag probsets that are significantly different
in both TCGA and Tothill using (a) a 5% FDR cutoff
and (b) a 10% FDR cutoff.
We plot the bivariate t-tests to look for
structure,
expression heatmaps for the selected probes
to look for patterns,
correlations between the probes chosen to
look for coordinated behavior, and
dot and density plots for individual
probesets to identify other features.
We write a convenience function to make
generation of the dot and density plots easier.
### 1.3 Results
There are 22277 probesets common to the two platforms.
We flag 8 probesets using a 5% FDR cutoff in both
datasets,
and 47 probesets using a 10% FDR cutoff.
The bivariate plot of the t-statistics found
is shown in
Figure [1](#bivarTVals); a zoomed version
highlighting the probesets overexpressed in RD samples
is shown in
Figure [2](#bivarTValsZoom).
The expression heatmaps for TCGA and Tothill
using the 5% FDR cutoffs are in
Figures [3](#tcgaHeatmap05pct) and
[4](#tothillHeatmap05pct), respectively.
The expression heatmaps for TCGA and Tothill
using the 10% FDR cutoffs are in
Figures [5](#tcgaHeatmap10pct) and
[6](#tothillHeatmap10pct), respectively.
Heatmaps of the pairwise 10% FDR probe correlations
for TCGA and Tothill are shown in Figures
[7](#tcgaProbesetCors) and [8](#tothillProbesetCors).
Dot and density plots for all 47 probesets passing
the 10% FDR filter are saved as
"plotsOfTop47Probesets.pdf" in the Reports folder.
Dot and density plots for 6 probesets, corresponding to
LUM, DCN, GADD45B, FABP4, ADH1B, and ADIPOQ are
shown in Figures
[9](#lum),
[10](#dcn),
[11](#gadd45b),
[12](#fabp4),
[13](#adh1b), and
[14](#adipoq), respectively.
We save
tcgaCommonUsed,
tothillCommonUsed,
keyProbesets05pct, keyGenes05pct,
keyProbesets10pct, keyGenes10pct,
nTCGANoRD, nTCGARD,
nTothillNoRD, nTothillRD,
plotProbesetResults, and
rdTTestResults
to the RData file "rdFlaggedGenes.RData".
### 1.4 Conclusions
In both sets of expression heatmaps, there is evidence
of a molecularly distinct subset of patients (about a third)
with a higher chance of having RD. Expression levels for most
of the genes identified are consistently higher in these patients.
For LUM, DCN, and GADD45B, which represent the bulk
of the probsets showing elevation, what we see is an
overall mean shift (values are trending higher) without
a clear division point (above here, something's changed).
For FABP4, ADH1B, and (to a lesser extent) ADIPOQ, we
see a *qualitative* shift in a smaller subset -- values for most samples
are very low (effectively "off""), but values for
a subset of patients are very high ("on"").
A qualitative difference strikes us as
more likely to survive a shift across assays
than a mean offset, so we preferentially
pursue FABP4 and ADH1B.
## 2 Libraries
We first load the libraries we will use
in this report.
```{r libraries, message=FALSE}
library(affy)
library(hthgu133a.db)
library(gplots)
```
## 3 Loading the Data
Here we simply load the previously assembled RData files.
clinical information
and expression matrices, and skim the first line of the clinical
information to see what variables exist for filtering the samples.
```{r loadTCGARData}
load(file.path("RDataObjects","tcgaExpression.RData"))
load(file.path("RDataObjects","tcgaFilteredSamples.RData"))
load(file.path("RDataObjects","tothillExpression.RData"))
load(file.path("RDataObjects","tothillFilteredSamples.RData"))
load(file.path("RDataObjects","tothillClinical.RData"))
```
## 4 Rearranging Data
### 4.1 Selecting Common Probesets
We only want to examine probesets evaluated in both
datasets.
```{r findCommonProbes}
commonProbesets <-
intersect(rownames(tcgaExpression),rownames(tothillExpression))
```
### 4.2 Extracting RD and No RD Samples
Given the common probesets, we next get matrices
of data for RD and No RD measurements for both
TCGA and Tothill.
We begin with TCGA
```{r getTCGACommonRD}
tcgaCommonRD <-
tcgaExpression[commonProbesets,
names(tcgaSampleRD)[which((tcgaSampleRD=="RD") &
(tcgaFilteredSamples[,"sampleUse"]=="Used"))]]
dim(tcgaCommonRD)
tcgaCommonNoRD <-
tcgaExpression[commonProbesets,
names(tcgaSampleRD)[which((tcgaSampleRD=="No RD") &
(tcgaFilteredSamples[,"sampleUse"]=="Used"))]]
dim(tcgaCommonNoRD)
```
Next, we repeat the process for Tothill.
```{r getTothillCommonRD}
tothillCommonRD <-
tothillExpression[commonProbesets,
names(tothillRD)[which((tothillRD=="RD") &
(tothillFilteredSamples[,"sampleUse"]=="Used"))]]
dim(tothillCommonRD)
tothillCommonNoRD <-
tothillExpression[commonProbesets,
names(tothillRD)[which((tothillRD=="No RD") &
(tothillFilteredSamples[,"sampleUse"]=="Used"))]]
dim(tothillCommonNoRD)
```
### 4.3 Bundling
For later plots, it can be easier to rearrange things
yet again.
```{r bundleMatrices}
tcgaCommonUsed <- cbind(tcgaCommonNoRD,tcgaCommonRD)
tothillCommonUsed <- cbind(tothillCommonNoRD,tothillCommonRD)
```
## 5 Contrasting RD with No RD: T-Tests
### 5.1 Running T-Tests
Our first comparisons involve simple two-sample
t-tests. We perform these for TCGA first.
```{r tcgaTTests}
d1 <- date()
tcgaTVals <- rep(0,length(commonProbesets))
names(tcgaTVals) <- commonProbesets
tcgaPVals <- tcgaTVals
for(i1 in 1:length(commonProbesets)){
tempT <- t.test(tcgaCommonRD[i1,],tcgaCommonNoRD[i1,],var.equal=TRUE)
tcgaTVals[i1] <- tempT[["statistic"]]
tcgaPVals[i1] <- tempT[["p.value"]]
}
d2 <- date()
c(d1,d2)
tcgaPValsAdj <- p.adjust(tcgaPVals,method="fdr")
names(tcgaPValsAdj) <- commonProbesets
```
Then we repeat the process with Tothill.
```{r tothillTTests}
d1 <- date()
tothillTVals <- rep(0,length(commonProbesets))
names(tothillTVals) <- commonProbesets
tothillPVals <- tothillTVals
for(i1 in 1:length(commonProbesets)){
tempT <- t.test(tothillCommonRD[i1,],tothillCommonNoRD[i1,],var.equal=TRUE)
tothillTVals[i1] <- tempT[["statistic"]]
tothillPVals[i1] <- tempT[["p.value"]]
}
d2 <- date()
c(d1,d2)
tothillPValsAdj <- p.adjust(tothillPVals,method="fdr")
names(tothillPValsAdj) <- commonProbesets
```
### 5.2 Checking for Overlap at an Extreme Cutoff
We now see which genes (if any) appear significant at
an FDR of 5% in both datasets.
```{r check5pct}
sum(tcgaPValsAdj < 0.05)
sum(tothillPValsAdj < 0.05)
sum((tothillPValsAdj < 0.05) & (tcgaPValsAdj < 0.05))
sum((tothillPValsAdj < 0.1) & (tcgaPValsAdj < 0.1))
keyProbesets05pct <-
names(which((tothillPValsAdj < 0.05) & (tcgaPValsAdj < 0.05)))
keyProbesets10pct <-
names(which((tothillPValsAdj < 0.1) & (tcgaPValsAdj < 0.1)))
keyGenes05pct <- unlist(mget(keyProbesets05pct, hthgu133aSYMBOL))
keyGenes10pct <- unlist(mget(keyProbesets10pct, hthgu133aSYMBOL))
keyGenes05pct
```
There are 8 probesets flagged at a common FDR of 5% (listed
above), and 47 probesets flagged at a common FDR of 10%.
### 5.3 Building a Data Frame
Now we bundle our t-test results into a data
frame for later reference, sorting the entries
by mean fdr-adjusted p-value.
```{r buildDataFrame}
tcgaMeanRD <- apply(tcgaCommonRD,1,mean)
tcgaMeanNoRD <- apply(tcgaCommonNoRD,1,mean)
tothillMeanRD <- apply(tothillCommonRD,1,mean)
tothillMeanNoRD <- apply(tothillCommonNoRD,1,mean)
commonGeneSymbols <- unlist(mget(commonProbesets, hthgu133aSYMBOL))
rdTTestResults <-
data.frame(row.names=rownames(tcgaCommonUsed),
geneSymbol=commonGeneSymbols,
tcgaMeanRD=tcgaMeanRD, tcgaMeanNoRD=tcgaMeanNoRD,
tcgaTVals=tcgaTVals, tcgaPVals=tcgaPVals,
tcgaPValsAdj=tcgaPValsAdj,
tothillMeanRD=tothillMeanRD, tothillMeanNoRD=tothillMeanNoRD,
tothillTVals=tothillTVals, tothillPVals=tothillPVals,
tothillPValsAdj=tothillPValsAdj)
rdTTestResults <- rdTTestResults[order(tcgaPValsAdj + tothillPValsAdj),]
```
As a check, we look at the results for the top 10 probesets
by this ordering.
```{r checkTop10Probesets}
rdTTestResults[1:10,]
```
## 6 Plotting Data
Given the contrast results, we now plot the data in several
ways to see if this clarifies aspects of the structure.
### 6.1 Bivariate t-value Plot
Our first check involves plotting the TCGA and Tothill
t-statistics against each other, to see if there are
clear outliers or disagreement with respect to sign.
The initial plot,
Figure [1](#bivarTVals),
shows the vast majority of the probesets selected are
more strongly expressed in RD samples. A zoom on
the upper quadrant of this plot is shown in
Figure [2](#bivarTValsZoom).
```{r plotBivarTVals, fig.show='asis', echo=FALSE}
tcgaCut05pct <- (min(abs(tcgaTVals)[tcgaPValsAdj <= 0.05]) +
max(abs(tcgaTVals)[tcgaPValsAdj > 0.05]))/2
tcgaCut10pct <- (min(abs(tcgaTVals)[tcgaPValsAdj <= 0.1]) +
max(abs(tcgaTVals)[tcgaPValsAdj > 0.1]))/2
tothillCut05pct <- (min(abs(tothillTVals)[tothillPValsAdj <= 0.05]) +
max(abs(tothillTVals)[tothillPValsAdj > 0.05]))/2
tothillCut10pct <- (min(abs(tothillTVals)[tothillPValsAdj <= 0.1]) +
max(abs(tothillTVals)[tothillPValsAdj > 0.1]))/2
smoothScatter(tcgaTVals, tothillTVals,
xlab="TCGA", ylab="Tothill",
main="T-Tests, RD-No RD, 5% and 10% FDRs Shown")
abline(h=c(-1,1)*tothillCut05pct,col="red")
abline(h=c(-1,1)*tothillCut10pct,col="black")
abline(v=c(-1,1)*tcgaCut05pct,col="red")
abline(v=c(-1,1)*tcgaCut10pct,col="black")
points(tcgaTVals[keyProbesets10pct],tothillTVals[keyProbesets10pct],
pch="*",col="black")
text(tcgaTVals[keyProbesets10pct],tothillTVals[keyProbesets10pct],
labels=keyGenes10pct,cex=0.5,col="black",adj=c(0,0))
points(tcgaTVals[keyProbesets05pct],tothillTVals[keyProbesets05pct],
pch="*",col="red")
text(tcgaTVals[keyProbesets05pct],tothillTVals[keyProbesets05pct],
labels=keyGenes05pct,cex=0.5,col="red",adj=c(0,0))
```
Figure 1: Bivariate plot of two-sample RD-No RD t-values for TCGA and
Tothill. The vast majority of the probesets selected show higher
expression in RD cases. Lumican (LUM) is the strongest overall.
```{r plotBivarTValsZoom, fig.show='asis', echo=FALSE}
smoothScatter(tcgaTVals, tothillTVals,
xlab="TCGA", ylab="Tothill",
main="T-Tests, RD-No RD, Upper Quadrant Zoom",
xlim=c(3,6),ylim=c(3.3,5.2))
abline(h=c(-1,1)*tothillCut05pct,col="red")
abline(h=c(-1,1)*tothillCut10pct,col="black")
abline(v=c(-1,1)*tcgaCut05pct,col="red")
abline(v=c(-1,1)*tcgaCut10pct,col="black")
points(tcgaTVals[keyProbesets10pct],tothillTVals[keyProbesets10pct],
pch="*",col="black")
text(tcgaTVals[keyProbesets10pct],tothillTVals[keyProbesets10pct],
labels=keyGenes10pct,cex=0.7,col="black",adj=c(0,0))
points(tcgaTVals[keyProbesets05pct],tothillTVals[keyProbesets05pct],
pch="*",col="red")
text(tcgaTVals[keyProbesets05pct],tothillTVals[keyProbesets05pct],
labels=keyGenes05pct,cex=0.7,col="red",adj=c(0,0))
```
Figure 2: Zoom on the upper quadrant of the
bivariate plot of two-sample RD-No RD t-values for TCGA and
Tothill, to show the names more clearly.
### 6.2 Heatmaps of Probesets Flagged by 5% FDR
We want to see if the most clearly chosen probesets
are flagging the same samples, and how clearly they
divide RD from No RD cases. We check this first
for the 8 probesets passing the 5% FDR filters.
The TCGA heatmap is shown in
Figure [3](#tcgaHeatmap05pct), and
the Tothill heatmap is shown in
Figure [4](#tothillHeatmap05pct).
The general story is the same in both; we see
a clear cluster in which most of the probesets
are concurrently elevated, and the patients in
these clusters have much higher rates of RD.
Of the 8 probesets examined, those for FABP4 and
ADH1B stand out as telling the story most starkly,
though the enrichment rates may not be much higher.
We do not see a tight grouping of the No RD cases,
but this is to be expected since some cases of
RD will not be driven at the molecular level but
rather by spatial positioning in the abdomen.
```{r plotTCGA05pctHeatmap, echo=FALSE}
nTCGANoRD <- ncol(tcgaCommonNoRD)
nTCGARD <- ncol(tcgaCommonRD)
heatmap.2(tcgaCommonUsed[keyProbesets05pct,], scale="row",
trace="none", labRow=keyGenes05pct, labCol="",
col=bluered,
ColSideColors=c(rep("blue",nTCGANoRD),rep("red",nTCGARD)),
xlab="TCGA Samples", cexRow=0.7)
```
Figure 3: Heatmap of the TCGA Samples using just the 8 probesets passing the
5% FDR filter for both TCGA and Tothill. RD status (Red=RD, Blue=No
RD) is indicated in the colorbar at top. There is a clear cluster
at the left in which most of these genes are concurrently elevated;
the density of RD cases is much higher in this group. Of the probesets
shown, FABP4 and ADH1B stand out from the rest in that they show
a much more marked "on/off" pattern.
```{r plotTothill05pctHeatmap, echo=FALSE}
nTothillNoRD <- ncol(tothillCommonNoRD)
nTothillRD <- ncol(tothillCommonRD)
heatmap.2(tothillCommonUsed[keyProbesets05pct,], scale="row",
trace="none", labRow=keyGenes05pct, labCol="",
col=bluered,
ColSideColors=c(rep("blue",nTothillNoRD),rep("red",nTothillRD)),
xlab="Tothill Samples", cexRow=0.7)
```
Figure 4: Heatmap of the Tothill Samples using just the 8 probesets passing the
5% FDR filter for both TCGA and Tothill. RD status (Red=RD, Blue=No
RD) is indicated in the colorbar at top. The story essentially
parallels that for the TCGA data.
There is a clear cluster
at the right in which most of these genes are concurrently elevated;
the density of RD cases is much higher in this group. Of the probesets
shown, FABP4 and ADH1B stand out from the rest in that they show
a much more marked "on/off" pattern.
### 6.3 Heatmaps of Probesets Flagged by 10% FDR
Having examined the 5% FDR probesets, we now expand our
view to encompass probesets passing a 10% FDR filter in
both TCGA and Tothill.
The TCGA heatmap is shown in
Figure [5](#tcgaHeatmap10pct), and
the Tothill heatmap is shown in
Figure [6](#tothillHeatmap10pct).
One factor that becomes more apparent here is the
broadly parallel pattern of overexpression seen for
most of the probesets (FABP4 and ADH1B again stand out).
This suggests there may be a common driver for many of
them; possibly a "pathway" of some type.
```{r plotTCGA10pctHeatmap, echo=FALSE}
heatmap.2(tcgaCommonUsed[keyProbesets10pct,], scale="row",
trace="none", labRow=keyGenes10pct, labCol="",
col=bluered,
ColSideColors=c(rep("blue",nTCGANoRD),rep("red",nTCGARD)),
xlab="TCGA Samples", cexRow=0.7)
```
Figure 5: Heatmap of the TCGA Samples using the 47 probesets passing the
10% FDR filter for both TCGA and Tothill. RD status (Red=RD, Blue=No
RD) is indicated in the colorbar at top. While FABP4, ADH1B, and,
to a lesser extent ADIPOQ again stand out, the main visual impression
is one of parallel expression for most of the probesets,
suggesting some common underlying driving factor.
```{r plotTothill10pctHeatmap, echo=FALSE}
heatmap.2(tothillCommonUsed[keyProbesets10pct,], scale="row",
trace="none", labRow=keyGenes10pct, labCol="",
col=bluered,
ColSideColors=c(rep("blue",nTothillNoRD),rep("red",nTothillRD)),
xlab="Tothill Samples", cexRow=0.7)
```
Figure 6: Heatmap of the Tothill Samples using the 47 probesets passing the
10% FDR filter for both TCGA and Tothill. RD status (Red=RD, Blue=No
RD) is indicated in the colorbar at top. As with the TCGA data,
While FABP4, and ADH1B
again stand out, the main visual impression
is one of parallel expression for most of the probesets,
suggesting some common underlying driving factor.
### 6.4 Heatmaps of Correlation of Probesets Flagged by 10% FDR
Given the broad parallelism of expression seen in the heatmaps of
probesets passing the 10% FDR filters, we want to check the
correlation patterns between these probes.
The TCGA heatmap is shown in
Figure [7](#tcgaProbesetCors), and
the Tothill heatmap is shown in
Figure [8](#tothillProbesetCors).
```{r plotTCGAProbesetCors, echo=FALSE}
tcgaCors <- cor(t(tcgaCommonUsed[keyProbesets10pct,]))
heatmap.2(tcgaCors, trace="none", labRow=keyGenes10pct, col=bluered)
```
Figure 7: Correlations in the TCGA data between the
47 probesets selected by 10% FDR cutoffs. While
the three probesets where expression declines
drive the coloring most, the main story in terms
of commonality may be the grouping of 26
probesets in the upper right, including lumican (LUM)
and decorin (DCN). Cutting the probsets into three
clusters overall,
FABP4 and ADH1B reside in the second
grouping of probesets at the right.
```{r plotTothillProbesetCors, echo=FALSE}
tothillCors <- cor(t(tothillCommonUsed[keyProbesets10pct,]))
heatmap.2(tothillCors, trace="none", labRow=keyGenes10pct, col=bluered)
```
Figure 8: Correlations in the Tothill data between the
47 probesets selected by 10% FDR cutoffs. While
the three probesets where expression declines
drive the coloring most, the main story in terms
of commonality may be the tight grouping of 31
probesets in the upper right, including lumican (LUM)
and decorin (DCN). FABP4 and ADH1B reside in the second
grouping of probesets at the right.
### 6.5 Density Plots and Dotplots
We know some of the probesets are of interest.
Now we look at aspects of behavior of the individual
probesets, specifically dotplots and density plots
for each probeset by dataset.
First, we construct a generic function for plotting
these results for a given probeset.
```{r plotProbesetResults}
plotProbesetResults <- function(probesetID){
par(mfrow=c(2,2))
geneName <- unlist(mget(probesetID, hthgu133aSYMBOL))
plot(tcgaCommonUsed[probesetID,],
col=c(rep("blue",nTCGANoRD),rep("red",nTCGARD)),
xlab="TCGA Samples",
ylab="Expression",
main=paste("Expression of",geneName,"in TCGA"))
abline(v=nTCGANoRD+0.5)
plot(tothillCommonUsed[probesetID,],
col=c(rep("blue",nTothillNoRD),rep("red",nTothillRD)),
xlab="Tothill Samples",
ylab="Expression",
main=paste("Expression of",geneName,"in Tothill"))
abline(v=nTothillNoRD+0.5)
tempDensTCGA <- density(tcgaCommonUsed[probesetID,])
tempDensTCGANoRD <- density(tcgaCommonNoRD[probesetID,])
tempDensTCGARD <- density(tcgaCommonRD[probesetID,])
plot(tempDensTCGA[["x"]], tempDensTCGA[["y"]],
xlab=paste("Expression of", probesetID, "in TCGA"),
ylab="Density", type="l",
main=paste("Density of",probesetID,"in TCGA"))
lines(tempDensTCGANoRD[["x"]],
(nTCGANoRD/(nTCGANoRD+nTCGARD)) * tempDensTCGANoRD[["y"]],
col="blue")
lines(tempDensTCGARD[["x"]],
(nTCGARD/(nTCGANoRD+nTCGARD)) * tempDensTCGARD[["y"]],
col="red")
tempDensTothill <- density(tothillCommonUsed[probesetID,])
tempDensTothillNoRD <- density(tothillCommonNoRD[probesetID,])
tempDensTothillRD <- density(tothillCommonRD[probesetID,])
plot(tempDensTothill[["x"]], tempDensTothill[["y"]],
xlab=paste("Expression of", probesetID, "in Tothill"),
ylab="Density", type="l",
main=paste("Density of",probesetID,"in Tothill"))
lines(tempDensTothillNoRD[["x"]],
(nTothillNoRD/(nTothillNoRD+nTothillRD)) *
tempDensTothillNoRD[["y"]],
col="blue")
lines(tempDensTothillRD[["x"]],
(nTothillRD/(nTothillNoRD+nTothillRD)) *
tempDensTothillRD[["y"]],
col="red")
par(mfrow=c(1,1))
}
```
For reference, we produce a pdf file containing the results
for all 47 probesets in our top list.
```{r outputTop47}
pdf(file=file.path("Reports","plotsOfTop47Probesets.pdf"))
for(i1 in 1:length(keyProbesets10pct)){
plotProbesetResults(keyProbesets10pct[i1])
}
dev.off()
```
We include results for a few selected genes here:
LUM (201744\_s\_at), Figure [9](#lum),
DCN (211896\_s\_at), Figure [10](#dcn),
GADD45B (207574\_s\_at), Figure [11](#gadd45b),
FABP4 (203980\_at), Figure [12](#fabp4),
ADH1B (209613\_s\_at), Figure [13](#adh1b), and
ADIPOQ (207175\_at), Figure [14](#adipoq).
For some of the genes (ADH1B, DCN), mutliple probesets
are available but the results appear qualitatively similar
to the representative ones chosen.
For LUM, DCN, and GADD45B, which represent the bulk
of the probsets showing elevation, what we see is an
overall mean shift (values are trending higher) without
a clear division point (above here, something's changed).
For FABP4, ADH1B, and (to a lesser extent) ADIPOQ, we
see a qualitative shift -- values for most samples
are very low (effectively "off""), but values for
a subset of patients are very high ("on"").
This type of qualitative difference strikes us as
more likely to survive a shift across assays
than a mean offset, so we will preferentially
pursue FABP4 and ADH1B.
```{r plotLUM, echo=FALSE}
plotProbesetResults("201744_s_at")
```
Figure 9: Dot and density plots for lumican (LUM) in TCGA
and Tothill. Cases with No RD are blue, RD are red.
While there is a clear mean shift (which drives the
t-test results), there is not a clearly defined
cutpoint.
```{r plotDCN, echo=FALSE}
plotProbesetResults("211896_s_at")
```
Figure 10: Dot and density plots for decorin (DCN) in TCGA
and Tothill. Cases with No RD are blue, RD are red.
While there is a clear mean shift (which drives the
t-test results), there is not a clearly defined
cutpoint.
```{r plotGADD45B, echo=FALSE}
plotProbesetResults("207574_s_at")
```
Figure 11: Dot and density plots for GADD45B in TCGA
and Tothill. Cases with No RD are blue, RD are red.
While there is a clear mean shift (which drives the
t-test results), there is not a clearly defined
cutpoint.
```{r plotFABP4, echo=FALSE}
plotProbesetResults("203980_at")
```
Figure 12: Dot and density plots for FABP4 in TCGA
and Tothill. Cases with No RD are blue, RD are red.
There is a qualitative shift in a subset of the patients.
```{r plotADH1B, echo=FALSE}
plotProbesetResults("209613_s_at")
```
Figure 13: Dot and density plots for ADH1B in TCGA
and Tothill. Cases with No RD are blue, RD are red.
There is a qualitative shift in a subset of the patients.
```{r plotADIPOQ, echo=FALSE}
plotProbesetResults("207175_at")
```
Figure 14: Dot and density plots for ADIPOQ in TCGA
and Tothill. Cases with No RD are blue, RD are red.
There is a qualitative shift in a subset of the patients.
## 7 Saving RData
Now we save the relevant information to an RData object.
```{r saveRDFlaggedGenes}
save(tcgaCommonUsed,
tothillCommonUsed,
keyProbesets05pct, keyGenes05pct,
keyProbesets10pct, keyGenes10pct,
nTCGANoRD, nTCGARD,
nTothillNoRD, nTothillRD,
plotProbesetResults,
rdTTestResults,
file=file.path("RDataObjects","rdFlaggedGenes.RData"))
```
## 8 Appendix
### 8.1 File Location
```{r getLocation}
getwd()
```
## 8.2 SessionInfo
```{r sessionInfo}
sessionInfo();
```