Susan L. Tucker
opts_chunk$set(tidy = TRUE, message = TRUE)
The goal of this analysis is to determine whether expression levels of FABP4 or ADH1B in the TCGA ovarian cohort are correlated with protein levels measured using reverse-phase protein arrays (RPPA).
Level 3 RPPA data from ovarian samples were downloaded from the TCGA website on March 20, 2014.
Samples are identified having both gene expression data and RPPA data available.
Spearman correlation analysis is performed between FABP4 expression and protein levels for each protein represented in the RPPA data. The same analysis is performed for ADH1B expression using the probeset of interest identified in previous analyses.
Permutation analyses are performed in which the FABP4 and ADH1B values are randomly permuted among patients 100 times and correlation analyses repeated.
The distributions of P-values are investigated and compared to those obtained from the permutation tests.
A heatmap of selected proteins is produced to investigate patterns in protein levels.
Scatterplots between FABP4 expression and protein levels, and between ADH1B expression and protein levels, are produced for proteins that may be associated with expression level of either of these genes.
RPPA data are available from 165 proteins in 412 samples. There are 354 patients with both RRPA data and expression data available.
Correlation coeffients range from -0.353 to 0.378. Comparison of P-values with the results of permutation analyses suggest that more proteins are correlated with FABP4 and/or ADH1B than expected by chance. The heatmap also suggests that some of the proteins investigated may be associated with differences in RD.
The scatterplots illustrate that the “significant” correlations between FABP4 and ADH1B expression and protein levels are weak, as reflected in the low correlation coefficients.
There appear to be some associations between FABP4 and ADH1B expression values and levels of some proteins measured using RPPA.
There is also some suggestion from the heatmap of associations between protein levels and incidence of RD.
These associations are weak in the current data set, but they warrent further investigation. They may help to elucidate biological mechanisms explaining the association between high FABP4 and ADH1B levels and significantly increased risk of RD after primary cytoreduction in high-grade serous ovarian cancer.
The relevant TCGA data objects (created previously) are loaded.
load(file.path("RDataObjects", "tcgaFilteredSamples.RData"))
load(file.path("RDataObjects", "tcgaExpression.RData"))
load("ovRPPA.RData")
Previously described sample filtrations are applied to the TCGA data.
rownames(tcgaFilteredSamples)[1:2]
## [1] "TCGA-13-0758-01A-01R-0362-01" "TCGA-09-0364-01A-02R-0362-01"
tcgaSampleUseLong <- rownames(tcgaFilteredSamples[which(tcgaFilteredSamples[,
"sampleUse"] == "Used"), ])
tcgaSampleUse <- substr(tcgaSampleUseLong, 1, 12)
length(tcgaSampleUse)
## [1] 491
length(unique(tcgaSampleUse))
## [1] 491
tcgaRD <- tcgaSampleRD[tcgaSampleUseLong]
names(tcgaRD) <- tcgaSampleUse
We identify the subset of patients having both RPPA data and expression data available.
colnames(ovRPPA)[1:2]
## [1] "TCGA-04-1335-01A-21-20" "TCGA-04-1336-01A-21-20"
dim(ovRPPA)
## [1] 165 412
length(unique(substr(colnames(ovRPPA), 1, 12)))
## [1] 412
colnames(ovRPPA) <- substr(colnames(ovRPPA), 1, 12)
ptList <- intersect(colnames(ovRPPA), tcgaSampleUse)
length(ptList)
## [1] 354
rppaUse <- ovRPPA[, ptList]
rdUse <- tcgaRD[ptList]
We extract the FABP4 and ADH1B expression values from the expression matrix. We know which probesets these genes correspond to from earlier analyses.
colnames(tcgaExpression[, 1:2])
## [1] "TCGA-13-0758-01A-01R-0362-01" "TCGA-09-0364-01A-02R-0362-01"
tcgaExpressionUse <- tcgaExpression[, tcgaSampleUseLong]
colnames(tcgaExpressionUse) <- tcgaSampleUse
fabp4 <- tcgaExpressionUse["203980_at", ptList]
adh1b <- tcgaExpressionUse["209613_s_at", ptList]
We perform Spearman correlation analyses between FABP4 expression levels and RPPA values.
fabp4CorP <- apply(rppaUse, 1, function(x) {
cor.test(fabp4, x, method = "spearman", exact = FALSE)$p.value
})
fabp4CorRho <- apply(rppaUse, 1, function(x) {
cor.test(fabp4, x, method = "spearman", exact = FALSE)$estimate
})
summary(fabp4CorP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0112 0.0932 0.2360 0.4130 0.9670
summary(fabp4CorRho)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.3530 -0.0882 -0.0121 0.0017 0.0895 0.3680
We do the same for ADH1B.
adh1bCorP <- apply(rppaUse, 1, function(x) {
cor.test(adh1b, x, method = "spearman", exact = FALSE)$p.value
})
adh1bCorRho <- apply(rppaUse, 1, function(x) {
cor.test(adh1b, x, method = "spearman", exact = FALSE)$estimate
})
summary(adh1bCorP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0033 0.0979 0.2400 0.4020 0.9900
summary(adh1bCorRho)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.3270 -0.0881 -0.0014 -0.0005 0.0871 0.3780
We look at the histogram of P-values (BUM plots) obtained for each analysis. The distributions are markedly non-uniform, suggesting that more proteins are significantly associated with FABP4 and ADH1B than would be expected by chance.
hist(fabp4CorP, breaks = 50, xlab = "P-value", main = "FABP4")
hist(adh1bCorP, breaks = 50, xlab = "P-value", main = "ADH1B")
We wish to confirm that we see more small P-values than expected by chance. Therefore, we perform a permutation test in which randomly shuffled FABP4 and ADH1B values are compared to the RPPA data using correlation analysis.
nPerms <- 100
proteinList <- rownames(rppaUse)
if (file.exists("rppaPerm.RData")) {
load("rppaPerm.RData")
} else {
starttimePerm <- date()
set.seed(22)
pMxPerm <- matrix(0, nrow = length(proteinList), ncol = nPerms)
rownames(pMxPerm) <- proteinList
rhoMxPerm <- pMxPerm
pMxPerm2 <- pMxPerm
rhoMxPerm2 <- pMxPerm
nPt <- length(ptList)
for (i1 in 1:nPerms) {
fabp4Fake <- fabp4[sample(nPt)]
adh1bFake <- adh1b[sample(nPt)]
fakeP <- apply(rppaUse, 1, function(x) {
cor.test(fabp4Fake, x, method = "spearman", exact = FALSE)$p.value
})
fakeRho <- apply(rppaUse, 1, function(x) {
cor.test(fabp4Fake, x, method = "spearman", exact = FALSE)$estimate
})
fakeP2 <- apply(rppaUse, 1, function(x) {
cor.test(adh1bFake, x, method = "spearman", exact = FALSE)$p.value
})
fakeRho2 <- apply(rppaUse, 1, function(x) {
cor.test(adh1bFake, x, method = "spearman", exact = FALSE)$estimate
})
pMxPerm[, i1] <- fakeP
rhoMxPerm[, i1] <- fakeRho
pMxPerm2[, i1] <- fakeP2
rhoMxPerm2[, i1] <- fakeRho2
}
stoptimePerm <- date()
save(pMxPerm, rhoMxPerm, pMxPerm2, rhoMxPerm2, starttimePerm, stoptimePerm,
file = "rppaPerm.RData")
}
c(starttimePerm, stoptimePerm)
## [1] "Fri Mar 21 10:39:09 2014" "Fri Mar 21 10:39:47 2014"
For each of several significance levels, we plot the numbers of proteins significantly associated with FABP4 and compare them to the numbers expected by chance.
The findings indicate that 23 proteins are correlated with FABP4 at a significance level of P < 0.001, while at most one would be expected by chance.
min(fabp4CorP)
## [1] 8.282e-13
pList <- c(1e-11, 1e-10, 1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 1e-04, 0.001, 0.01)
numTrue <- unlist(lapply(pList, function(x) {
sum(fabp4CorP < x)
}))
meanPerm <- c()
sdPerm <- c()
upPerm <- c()
downPerm <- c()
for (i in 1:length(pList)) {
numPerm <- apply(pMxPerm, 2, function(x) {
sum(x < pList[i])
})
meanPerm <- c(meanPerm, mean(numPerm))
sdPerm <- c(sdPerm, sd(numPerm))
upPerm <- c(upPerm, quantile(numPerm, c(0.95)))
downPerm <- c(downPerm, quantile(numPerm, c(0.05)))
}
numTrue
## [1] 3 3 3 3 4 5 12 14 23 39
meanPerm
## [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.09 1.28
meanPerm + 2 * sdPerm
## [1] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2100 0.7317 3.7431
upPerm
## 95% 95% 95% 95% 95% 95% 95% 95% 95% 95%
## 0 0 0 0 0 0 0 0 1 4
plot(pList, numTrue, type = "b", xlab = "P-value", ylab = "Number of significant probesets",
pch = 16, ylim = c(0, 40), main = "FABP4", log = "x")
lines(pList, meanPerm, type = "b", col = "red", pch = 16)
lines(pList, meanPerm + 2 * sdPerm, type = "b", col = "blue", lty = 2, pch = 16)
lines(pList, upPerm, type = "b", col = "green", lty = 2, pch = 16)
lines(pList, downPerm, type = "b", col = "green", lty = 2, pch = 16)
legend(1e-11, 40, legend = c("Observed", "Mean from permutation tests", "Mean + 2 SD from permutation tests",
"Upper & lower 95% from permutation tests"), col = c("black", "red", "blue",
"green"), lty = c(1, 1, 2, 2), cex = 0.8)
We do the same thing for ADH1B and find that there are 30 proteins correlated with ADH1B at a signficance level of P < 0.001, while at most one would be expected by chance.
min(adh1bCorP)
## [1] 1.87e-13
numTrue2 <- unlist(lapply(pList, function(x) {
sum(adh1bCorP < x)
}))
meanPerm2 <- c()
sdPerm2 <- c()
upPerm2 <- c()
downPerm2 <- c()
for (i in 1:length(pList)) {
numPerm2 <- apply(pMxPerm2, 2, function(x) {
sum(x < pList[i])
})
meanPerm2 <- c(meanPerm2, mean(numPerm2))
sdPerm2 <- c(sdPerm2, sd(numPerm2))
upPerm2 <- c(upPerm2, quantile(numPerm2, c(0.95)))
downPerm2 <- c(downPerm2, quantile(numPerm2, c(0.05)))
}
numTrue2
## [1] 2 2 3 4 5 7 13 20 30 53
meanPerm2
## [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.08 1.37
meanPerm2 + 2 * sdPerm2
## [1] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2100 0.6253 4.2361
upPerm2
## 95% 95% 95% 95% 95% 95% 95% 95% 95% 95%
## 0 0 0 0 0 0 0 0 1 4
plot(pList, numTrue2, type = "b", xlab = "P-value", ylab = "Number of significant probesets",
pch = 16, ylim = c(0, 40), main = "ADH1B", log = "x")
lines(pList, meanPerm2, type = "b", col = "red", pch = 16)
lines(pList, meanPerm2 + 2 * sdPerm2, type = "b", col = "blue", lty = 2, pch = 16)
lines(pList, upPerm2, type = "b", col = "green", lty = 2, pch = 16)
lines(pList, downPerm2, type = "b", col = "green", lty = 2, pch = 16)
legend(1e-11, 40, legend = c("Observed", "Mean from permutation tests", "Mean + 2 SD from permutation tests",
"Upper & lower 95% from permutation tests"), col = c("black", "red", "blue",
"green"), lty = c(1, 1, 2, 2), cex = 0.8)
We identify the 23 proteins correlated with FABP4 at a signficance level of P < 0.001 as well as their correlation coefficients, first for the proteins having a positive correlation with FABP4 (N=14) and next for the ones having a negative corelation (N=9).
fabp4Proteins <- names(fabp4CorP)[fabp4CorP < 0.001]
fabp4ProteinsPos <- names(fabp4CorP)[fabp4CorP < 0.001 & fabp4CorRho > 0]
fabp4ProteinsNeg <- names(fabp4CorP)[fabp4CorP < 0.001 & fabp4CorRho < 0]
length(fabp4ProteinsPos)
## [1] 14
length(fabp4ProteinsNeg)
## [1] 9
fabp4CorRho[fabp4ProteinsPos]
## Caveolin-1-R-V c-Kit-R-V Collagen_VI-R-V
## 0.3682 0.1751 0.1974
## Dvl3-R-V Fibronectin-R-C FOXO3a_pS318_S321-R-C
## 0.2312 0.3643 0.2431
## JNK2-R-C p21-R-C Paxillin-R-V
## 0.1943 0.1994 0.2395
## Pea-15-R-V PKC-alpha-M-V PTCH-R-C
## 0.1824 0.1780 0.2377
## Transglutaminase-M-V VASP-R-C
## 0.1993 0.2507
fabp4CorRho[fabp4ProteinsNeg]
## Chk1_pS345-R-C Chk2_pT68-R-C Claudin-7-R-V
## -0.2948 -0.3528 -0.2019
## c-Met_pY1235-R-C C-Raf_pS338-R-C E-Cadherin-R-V
## -0.2775 -0.2266 -0.2464
## ER-alpha_pS118-R-V p27_pT157-R-C Src_pY416-R-C
## -0.2409 -0.2399 -0.1795
We do the same for ADH1B. There are 13 proteins positively associated with ADH1B at a significance level P < 0.001, and 17 that are negatively associated.
adh1bProteins <- names(adh1bCorP)[adh1bCorP < 0.001]
adh1bProteinsPos <- names(adh1bCorP)[adh1bCorP < 0.001 & adh1bCorRho > 0]
adh1bProteinsNeg <- names(adh1bCorP)[adh1bCorP < 0.001 & adh1bCorRho < 0]
length(adh1bProteinsPos)
## [1] 13
length(adh1bProteinsNeg)
## [1] 17
adh1bCorRho[adh1bProteinsPos]
## Annexin_I-R-V Caveolin-1-R-V Dvl3-R-V
## 0.1978 0.3744 0.1812
## Fibronectin-R-C FOXO3a_pS318_S321-R-C Paxillin-R-V
## 0.3778 0.2886 0.2040
## Pea-15-R-V PTCH-R-C PTEN-R-V
## 0.1951 0.2600 0.2100
## S6_pS235_S236-R-V S6_pS240_S244-R-V Transglutaminase-M-V
## 0.2646 0.2208 0.2094
## VASP-R-C
## 0.2470
adh1bCorRho[adh1bProteinsNeg]
## 53BP1-R-C alpha-Catenin-M-V beta-Catenin-R-V
## -0.1976 -0.2194 -0.2149
## CD31-M-V Chk1_pS345-R-C Chk2_pT68-R-C
## -0.1934 -0.2206 -0.3268
## Claudin-7-R-V c-Met_pY1235-R-C C-Raf_pS338-R-C
## -0.2077 -0.2454 -0.1907
## E-Cadherin-R-V ER-alpha_pS118-R-V Ku80-R-C
## -0.3144 -0.2485 -0.2031
## MSH2-M-C MSH6-R-C mTOR-R-V
## -0.2479 -0.2365 -0.1792
## p27_pT157-R-C PARP_cleaved-M-C
## -0.2436 -0.1766
We verify that proteins correlated with both FABP4 and ADH1B at a significance level of P < 0.001 have the same sign of the correlation coefficient in each case.
bothProteins <- intersect(fabp4Proteins, adh1bProteins)
table(sign(fabp4CorRho[bothProteins]) == sign(adh1bCorRho[bothProteins]))
##
## TRUE
## 17
The proteins that are positively correlated with both FABP4 and ADH1B expression are as follows.
unique(intersect(fabp4ProteinsPos, adh1bProteinsPos))
## [1] "Caveolin-1-R-V" "Dvl3-R-V" "Fibronectin-R-C"
## [4] "FOXO3a_pS318_S321-R-C" "Paxillin-R-V" "Pea-15-R-V"
## [7] "PTCH-R-C" "Transglutaminase-M-V" "VASP-R-C"
The proteins that are negatively correlated with both FABP4 and ADH1B are as follows.
unique(intersect(fabp4ProteinsNeg, adh1bProteinsNeg))
## [1] "Chk1_pS345-R-C" "Chk2_pT68-R-C" "Claudin-7-R-V"
## [4] "c-Met_pY1235-R-C" "C-Raf_pS338-R-C" "E-Cadherin-R-V"
## [7] "ER-alpha_pS118-R-V" "p27_pT157-R-C"
The following heatmap shows the levels of the 36 proteins associated with FABP4 and/or ADH1B at P < 0.001. The color bar at the top represents patients with and without RD in red and blue, respectively. The color bar at the side shows proteins positively and negatively correlated with FABP4 and/or ADH1B in red and blue, respectively.
library(gplots)
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
##
## Attaching package: 'gplots'
##
## The following object is masked from 'package:stats':
##
## lowess
table(rdUse)
## rdUse
## No RD RD
## 81 273
colVec <- rep("blue", length(ptList))
colVec[rdUse == "RD"] <- "red"
allProteins <- unique(c(fabp4Proteins, adh1bProteins))
length(allProteins)
## [1] 36
rowVec <- rep("blue", length(allProteins))
rowVec[sign(fabp4CorRho[allProteins]) > 0] <- "red"
heatmap.2(rppaUse[allProteins, ], scale = "row", trace = "none", labRow = allProteins,
labCol = "", col = bluered, ColSideColors = colVec, RowSideColors = rowVec,
xlab = "", cexRow = 1)
Scatterplots are produced showing the relationship between FABP4 and ADH1B expression levels with each of the 36 proteins whose levels are significantly correlated with expression levels of one or both genes at P < 0.001. We begin with the 18 positively associated proteins, then show the plots for the 12 negatively associated proteins. Spearman correlation coefficients (Rho) are shown above each plot.
proteinsPos <- unique(c(fabp4ProteinsPos, adh1bProteinsPos))
proteinsNeg <- unique(c(fabp4ProteinsNeg, adh1bProteinsNeg))
proteinList <- c(proteinsPos, proteinsNeg)
par(mfrow = c(2, 2))
for (i1 in 1:length(proteinList)) {
rho1 = round(1000 * fabp4CorRho[proteinList[i1]])/1000
rho2 = round(1000 * adh1bCorRho[proteinList[i1]])/1000
plot(fabp4, rppaUse[proteinList[i1], ], xlab = "Log2(FABP4 Expression)",
ylab = paste(proteinList[i1], " Levels by RPPA", sep = ""), main = paste("Rho = ",
rho1, sep = ""))
plot(adh1b, rppaUse[proteinList[i1], ], xlab = "Log2(ADH1B Expression)",
ylab = paste(proteinList[i1], " Levels by RPPA", sep = ""), main = paste("Rho = ",
rho2, sep = ""))
}
Data from proteins of interest is exported for pathway analysis.
We prepare additional lists for export in addition to proteins meeting the P < 0.001 correlation criterion.
fabp4Proteins01 <- names(fabp4CorP)[fabp4CorP < 0.01]
adh1bProteins01 <- names(adh1bCorP)[adh1bCorP < 0.01]
rppaFABP4001 <- rppaUse[fabp4Proteins, ]
rppaADH1B001 <- rppaUse[adh1bProteins, ]
rppaFABP401 <- rppaUse[fabp4Proteins01, ]
rppaADH1B01 <- rppaUse[adh1bProteins01, ]
corFABP401 <- fabp4CorRho[fabp4Proteins01]
corADH1B01 <- adh1bCorRho[adh1bProteins01]
write.table(fabp4, file = "FABP4.csv", sep = ",")
write.table(adh1b, file = "ADH1B.csv", sep = ",")
write.table(rppaFABP4001, file = "rppaFABP4001.csv", sep = ",")
write.table(rppaADH1B001, file = "rppaADH1B001.csv", sep = ",")
write.table(rppaFABP401, file = "rppaFABP401.csv", sep = ",")
write.table(rppaADH1B01, file = "rppaADH1B01.csv", sep = ",")
write.table(corFABP401, file = "corFABP401.csv", sep = ",")
write.table(corADH1B01, file = "corADH1B01.csv", sep = ",")
getwd()
## [1] "/Users/slt/SLT WORKSPACE/EXEMPT/OVARIAN/Ovarian residual disease study 2012/RD manuscript/Web page for paper/Webpage"
sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gplots_2.12.1 knitr_1.5
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-6 caTools_1.16 evaluate_0.5.1
## [4] formatR_0.10 gdata_2.13.2 gtools_3.2.1
## [7] KernSmooth_2.23-10 stringr_0.6.2 tools_3.0.2
[1] Hennessy BT, Lu Y, Gonzalez-Angulo AM, Carey MS, Myhre S, Ju Z, Davies MA, Liu W, Coombes K, Meric-Bernstam F, Bedrosian I, McGahren M, Agarwal R, Zhang F, Overgaard J, Alsner J, Neve RM, Kuo W-L, Gray JW, Borresen-Dale A-L, Mills GB. A Technical Assessment of the Utility of Reverse Phase Protein Arrays for the Study of the Functional Proteome in Non-microdissected Human Breast Cancers. Clin Proteome, 6:129-51, 2010.