Correlation Between FABP4 and ADH1B Expression and Protein Levels Measured by RPPA

Susan L. Tucker

opts_chunk$set(tidy = TRUE, message = TRUE)

1 Executive Summary

1.1 Introduction

The goal of this analysis is to determine whether expression levels of FABP4 or ADH1B in the TCGA ovarian cohort are correlated with protein levels measured using reverse-phase protein arrays (RPPA).

1.2 Data & Methods

Level 3 RPPA data from ovarian samples were downloaded from the TCGA website on March 20, 2014.

Samples are identified having both gene expression data and RPPA data available.

Spearman correlation analysis is performed between FABP4 expression and protein levels for each protein represented in the RPPA data. The same analysis is performed for ADH1B expression using the probeset of interest identified in previous analyses.

Permutation analyses are performed in which the FABP4 and ADH1B values are randomly permuted among patients 100 times and correlation analyses repeated.

The distributions of P-values are investigated and compared to those obtained from the permutation tests.

A heatmap of selected proteins is produced to investigate patterns in protein levels.

Scatterplots between FABP4 expression and protein levels, and between ADH1B expression and protein levels, are produced for proteins that may be associated with expression level of either of these genes.

1.3 Results

RPPA data are available from 165 proteins in 412 samples. There are 354 patients with both RRPA data and expression data available.

Correlation coeffients range from -0.353 to 0.378. Comparison of P-values with the results of permutation analyses suggest that more proteins are correlated with FABP4 and/or ADH1B than expected by chance. The heatmap also suggests that some of the proteins investigated may be associated with differences in RD.

The scatterplots illustrate that the “significant” correlations between FABP4 and ADH1B expression and protein levels are weak, as reflected in the low correlation coefficients.

1.4 Conclusion

There appear to be some associations between FABP4 and ADH1B expression values and levels of some proteins measured using RPPA.

There is also some suggestion from the heatmap of associations between protein levels and incidence of RD.

These associations are weak in the current data set, but they warrent further investigation. They may help to elucidate biological mechanisms explaining the association between high FABP4 and ADH1B levels and significantly increased risk of RD after primary cytoreduction in high-grade serous ovarian cancer.

2 Loading & Filtration of Data

The relevant TCGA data objects (created previously) are loaded.


load(file.path("RDataObjects", "tcgaFilteredSamples.RData"))
load(file.path("RDataObjects", "tcgaExpression.RData"))
load("ovRPPA.RData")

Previously described sample filtrations are applied to the TCGA data.


rownames(tcgaFilteredSamples)[1:2]
## [1] "TCGA-13-0758-01A-01R-0362-01" "TCGA-09-0364-01A-02R-0362-01"

tcgaSampleUseLong <- rownames(tcgaFilteredSamples[which(tcgaFilteredSamples[, 
    "sampleUse"] == "Used"), ])
tcgaSampleUse <- substr(tcgaSampleUseLong, 1, 12)
length(tcgaSampleUse)
## [1] 491
length(unique(tcgaSampleUse))
## [1] 491

tcgaRD <- tcgaSampleRD[tcgaSampleUseLong]
names(tcgaRD) <- tcgaSampleUse

We identify the subset of patients having both RPPA data and expression data available.


colnames(ovRPPA)[1:2]
## [1] "TCGA-04-1335-01A-21-20" "TCGA-04-1336-01A-21-20"
dim(ovRPPA)
## [1] 165 412
length(unique(substr(colnames(ovRPPA), 1, 12)))
## [1] 412
colnames(ovRPPA) <- substr(colnames(ovRPPA), 1, 12)

ptList <- intersect(colnames(ovRPPA), tcgaSampleUse)
length(ptList)
## [1] 354

rppaUse <- ovRPPA[, ptList]
rdUse <- tcgaRD[ptList]

We extract the FABP4 and ADH1B expression values from the expression matrix. We know which probesets these genes correspond to from earlier analyses.


colnames(tcgaExpression[, 1:2])
## [1] "TCGA-13-0758-01A-01R-0362-01" "TCGA-09-0364-01A-02R-0362-01"
tcgaExpressionUse <- tcgaExpression[, tcgaSampleUseLong]
colnames(tcgaExpressionUse) <- tcgaSampleUse

fabp4 <- tcgaExpressionUse["203980_at", ptList]
adh1b <- tcgaExpressionUse["209613_s_at", ptList]

3 Analyses

3.1 Correlation analyses

We perform Spearman correlation analyses between FABP4 expression levels and RPPA values.


fabp4CorP <- apply(rppaUse, 1, function(x) {
    cor.test(fabp4, x, method = "spearman", exact = FALSE)$p.value
})

fabp4CorRho <- apply(rppaUse, 1, function(x) {
    cor.test(fabp4, x, method = "spearman", exact = FALSE)$estimate
})

summary(fabp4CorP)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0112  0.0932  0.2360  0.4130  0.9670
summary(fabp4CorRho)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.3530 -0.0882 -0.0121  0.0017  0.0895  0.3680

We do the same for ADH1B.

adh1bCorP <- apply(rppaUse, 1, function(x) {
    cor.test(adh1b, x, method = "spearman", exact = FALSE)$p.value
})

adh1bCorRho <- apply(rppaUse, 1, function(x) {
    cor.test(adh1b, x, method = "spearman", exact = FALSE)$estimate
})

summary(adh1bCorP)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0033  0.0979  0.2400  0.4020  0.9900
summary(adh1bCorRho)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.3270 -0.0881 -0.0014 -0.0005  0.0871  0.3780

3.2 Distributions of P-values from the correlation analyses

We look at the histogram of P-values (BUM plots) obtained for each analysis. The distributions are markedly non-uniform, suggesting that more proteins are significantly associated with FABP4 and ADH1B than would be expected by chance.


hist(fabp4CorP, breaks = 50, xlab = "P-value", main = "FABP4")

plot of chunk bumFABP4


hist(adh1bCorP, breaks = 50, xlab = "P-value", main = "ADH1B")

plot of chunk bumADH1B

3.3 Permutation tests

We wish to confirm that we see more small P-values than expected by chance. Therefore, we perform a permutation test in which randomly shuffled FABP4 and ADH1B values are compared to the RPPA data using correlation analysis.


nPerms <- 100
proteinList <- rownames(rppaUse)

if (file.exists("rppaPerm.RData")) {
    load("rppaPerm.RData")
} else {

    starttimePerm <- date()
    set.seed(22)

    pMxPerm <- matrix(0, nrow = length(proteinList), ncol = nPerms)
    rownames(pMxPerm) <- proteinList
    rhoMxPerm <- pMxPerm
    pMxPerm2 <- pMxPerm
    rhoMxPerm2 <- pMxPerm

    nPt <- length(ptList)

    for (i1 in 1:nPerms) {

        fabp4Fake <- fabp4[sample(nPt)]
        adh1bFake <- adh1b[sample(nPt)]

        fakeP <- apply(rppaUse, 1, function(x) {
            cor.test(fabp4Fake, x, method = "spearman", exact = FALSE)$p.value
        })
        fakeRho <- apply(rppaUse, 1, function(x) {
            cor.test(fabp4Fake, x, method = "spearman", exact = FALSE)$estimate
        })

        fakeP2 <- apply(rppaUse, 1, function(x) {
            cor.test(adh1bFake, x, method = "spearman", exact = FALSE)$p.value
        })
        fakeRho2 <- apply(rppaUse, 1, function(x) {
            cor.test(adh1bFake, x, method = "spearman", exact = FALSE)$estimate
        })

        pMxPerm[, i1] <- fakeP
        rhoMxPerm[, i1] <- fakeRho

        pMxPerm2[, i1] <- fakeP2
        rhoMxPerm2[, i1] <- fakeRho2

    }

    stoptimePerm <- date()

    save(pMxPerm, rhoMxPerm, pMxPerm2, rhoMxPerm2, starttimePerm, stoptimePerm, 
        file = "rppaPerm.RData")
}

c(starttimePerm, stoptimePerm)
## [1] "Fri Mar 21 10:39:09 2014" "Fri Mar 21 10:39:47 2014"

3.4 Comparing results to those of permutation tests

For each of several significance levels, we plot the numbers of proteins significantly associated with FABP4 and compare them to the numbers expected by chance.

The findings indicate that 23 proteins are correlated with FABP4 at a significance level of P < 0.001, while at most one would be expected by chance.


min(fabp4CorP)
## [1] 8.282e-13

pList <- c(1e-11, 1e-10, 1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 1e-04, 0.001, 0.01)
numTrue <- unlist(lapply(pList, function(x) {
    sum(fabp4CorP < x)
}))

meanPerm <- c()
sdPerm <- c()
upPerm <- c()
downPerm <- c()

for (i in 1:length(pList)) {

    numPerm <- apply(pMxPerm, 2, function(x) {
        sum(x < pList[i])
    })
    meanPerm <- c(meanPerm, mean(numPerm))
    sdPerm <- c(sdPerm, sd(numPerm))

    upPerm <- c(upPerm, quantile(numPerm, c(0.95)))
    downPerm <- c(downPerm, quantile(numPerm, c(0.05)))

}

numTrue
##  [1]  3  3  3  3  4  5 12 14 23 39
meanPerm
##  [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.09 1.28
meanPerm + 2 * sdPerm
##  [1] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2100 0.7317 3.7431
upPerm
## 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 
##   0   0   0   0   0   0   0   0   1   4

plot(pList, numTrue, type = "b", xlab = "P-value", ylab = "Number of significant probesets", 
    pch = 16, ylim = c(0, 40), main = "FABP4", log = "x")

lines(pList, meanPerm, type = "b", col = "red", pch = 16)
lines(pList, meanPerm + 2 * sdPerm, type = "b", col = "blue", lty = 2, pch = 16)

lines(pList, upPerm, type = "b", col = "green", lty = 2, pch = 16)
lines(pList, downPerm, type = "b", col = "green", lty = 2, pch = 16)

legend(1e-11, 40, legend = c("Observed", "Mean from permutation tests", "Mean + 2 SD from permutation tests", 
    "Upper & lower 95% from permutation tests"), col = c("black", "red", "blue", 
    "green"), lty = c(1, 1, 2, 2), cex = 0.8)

plot of chunk plotComparePPermFABP4

We do the same thing for ADH1B and find that there are 30 proteins correlated with ADH1B at a signficance level of P < 0.001, while at most one would be expected by chance.


min(adh1bCorP)
## [1] 1.87e-13

numTrue2 <- unlist(lapply(pList, function(x) {
    sum(adh1bCorP < x)
}))

meanPerm2 <- c()
sdPerm2 <- c()
upPerm2 <- c()
downPerm2 <- c()

for (i in 1:length(pList)) {

    numPerm2 <- apply(pMxPerm2, 2, function(x) {
        sum(x < pList[i])
    })
    meanPerm2 <- c(meanPerm2, mean(numPerm2))
    sdPerm2 <- c(sdPerm2, sd(numPerm2))

    upPerm2 <- c(upPerm2, quantile(numPerm2, c(0.95)))
    downPerm2 <- c(downPerm2, quantile(numPerm2, c(0.05)))

}

numTrue2
##  [1]  2  2  3  4  5  7 13 20 30 53
meanPerm2
##  [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.08 1.37
meanPerm2 + 2 * sdPerm2
##  [1] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2100 0.6253 4.2361
upPerm2
## 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 
##   0   0   0   0   0   0   0   0   1   4

plot(pList, numTrue2, type = "b", xlab = "P-value", ylab = "Number of significant probesets", 
    pch = 16, ylim = c(0, 40), main = "ADH1B", log = "x")

lines(pList, meanPerm2, type = "b", col = "red", pch = 16)
lines(pList, meanPerm2 + 2 * sdPerm2, type = "b", col = "blue", lty = 2, pch = 16)

lines(pList, upPerm2, type = "b", col = "green", lty = 2, pch = 16)
lines(pList, downPerm2, type = "b", col = "green", lty = 2, pch = 16)

legend(1e-11, 40, legend = c("Observed", "Mean from permutation tests", "Mean + 2 SD from permutation tests", 
    "Upper & lower 95% from permutation tests"), col = c("black", "red", "blue", 
    "green"), lty = c(1, 1, 2, 2), cex = 0.8)

plot of chunk plotComparePPermADH1B

We identify the 23 proteins correlated with FABP4 at a signficance level of P < 0.001 as well as their correlation coefficients, first for the proteins having a positive correlation with FABP4 (N=14) and next for the ones having a negative corelation (N=9).


fabp4Proteins <- names(fabp4CorP)[fabp4CorP < 0.001]
fabp4ProteinsPos <- names(fabp4CorP)[fabp4CorP < 0.001 & fabp4CorRho > 0]
fabp4ProteinsNeg <- names(fabp4CorP)[fabp4CorP < 0.001 & fabp4CorRho < 0]
length(fabp4ProteinsPos)
## [1] 14
length(fabp4ProteinsNeg)
## [1] 9

fabp4CorRho[fabp4ProteinsPos]
##        Caveolin-1-R-V             c-Kit-R-V       Collagen_VI-R-V 
##                0.3682                0.1751                0.1974 
##              Dvl3-R-V       Fibronectin-R-C FOXO3a_pS318_S321-R-C 
##                0.2312                0.3643                0.2431 
##              JNK2-R-C               p21-R-C          Paxillin-R-V 
##                0.1943                0.1994                0.2395 
##            Pea-15-R-V         PKC-alpha-M-V              PTCH-R-C 
##                0.1824                0.1780                0.2377 
##  Transglutaminase-M-V              VASP-R-C 
##                0.1993                0.2507
fabp4CorRho[fabp4ProteinsNeg]
##     Chk1_pS345-R-C      Chk2_pT68-R-C      Claudin-7-R-V 
##            -0.2948            -0.3528            -0.2019 
##   c-Met_pY1235-R-C    C-Raf_pS338-R-C     E-Cadherin-R-V 
##            -0.2775            -0.2266            -0.2464 
## ER-alpha_pS118-R-V      p27_pT157-R-C      Src_pY416-R-C 
##            -0.2409            -0.2399            -0.1795

We do the same for ADH1B. There are 13 proteins positively associated with ADH1B at a significance level P < 0.001, and 17 that are negatively associated.


adh1bProteins <- names(adh1bCorP)[adh1bCorP < 0.001]
adh1bProteinsPos <- names(adh1bCorP)[adh1bCorP < 0.001 & adh1bCorRho > 0]
adh1bProteinsNeg <- names(adh1bCorP)[adh1bCorP < 0.001 & adh1bCorRho < 0]
length(adh1bProteinsPos)
## [1] 13
length(adh1bProteinsNeg)
## [1] 17

adh1bCorRho[adh1bProteinsPos]
##         Annexin_I-R-V        Caveolin-1-R-V              Dvl3-R-V 
##                0.1978                0.3744                0.1812 
##       Fibronectin-R-C FOXO3a_pS318_S321-R-C          Paxillin-R-V 
##                0.3778                0.2886                0.2040 
##            Pea-15-R-V              PTCH-R-C              PTEN-R-V 
##                0.1951                0.2600                0.2100 
##     S6_pS235_S236-R-V     S6_pS240_S244-R-V  Transglutaminase-M-V 
##                0.2646                0.2208                0.2094 
##              VASP-R-C 
##                0.2470
adh1bCorRho[adh1bProteinsNeg]
##          53BP1-R-C  alpha-Catenin-M-V   beta-Catenin-R-V 
##            -0.1976            -0.2194            -0.2149 
##           CD31-M-V     Chk1_pS345-R-C      Chk2_pT68-R-C 
##            -0.1934            -0.2206            -0.3268 
##      Claudin-7-R-V   c-Met_pY1235-R-C    C-Raf_pS338-R-C 
##            -0.2077            -0.2454            -0.1907 
##     E-Cadherin-R-V ER-alpha_pS118-R-V           Ku80-R-C 
##            -0.3144            -0.2485            -0.2031 
##           MSH2-M-C           MSH6-R-C           mTOR-R-V 
##            -0.2479            -0.2365            -0.1792 
##      p27_pT157-R-C   PARP_cleaved-M-C 
##            -0.2436            -0.1766

We verify that proteins correlated with both FABP4 and ADH1B at a significance level of P < 0.001 have the same sign of the correlation coefficient in each case.


bothProteins <- intersect(fabp4Proteins, adh1bProteins)
table(sign(fabp4CorRho[bothProteins]) == sign(adh1bCorRho[bothProteins]))
## 
## TRUE 
##   17

The proteins that are positively correlated with both FABP4 and ADH1B expression are as follows.


unique(intersect(fabp4ProteinsPos, adh1bProteinsPos))
## [1] "Caveolin-1-R-V"        "Dvl3-R-V"              "Fibronectin-R-C"      
## [4] "FOXO3a_pS318_S321-R-C" "Paxillin-R-V"          "Pea-15-R-V"           
## [7] "PTCH-R-C"              "Transglutaminase-M-V"  "VASP-R-C"

The proteins that are negatively correlated with both FABP4 and ADH1B are as follows.


unique(intersect(fabp4ProteinsNeg, adh1bProteinsNeg))
## [1] "Chk1_pS345-R-C"     "Chk2_pT68-R-C"      "Claudin-7-R-V"     
## [4] "c-Met_pY1235-R-C"   "C-Raf_pS338-R-C"    "E-Cadherin-R-V"    
## [7] "ER-alpha_pS118-R-V" "p27_pT157-R-C"

3.5 Heatmap

The following heatmap shows the levels of the 36 proteins associated with FABP4 and/or ADH1B at P < 0.001. The color bar at the top represents patients with and without RD in red and blue, respectively. The color bar at the side shows proteins positively and negatively correlated with FABP4 and/or ADH1B in red and blue, respectively.


library(gplots)
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
## 
## Attaching package: 'gplots'
## 
## The following object is masked from 'package:stats':
## 
##     lowess

table(rdUse)
## rdUse
## No RD    RD 
##    81   273
colVec <- rep("blue", length(ptList))
colVec[rdUse == "RD"] <- "red"

allProteins <- unique(c(fabp4Proteins, adh1bProteins))
length(allProteins)
## [1] 36

rowVec <- rep("blue", length(allProteins))
rowVec[sign(fabp4CorRho[allProteins]) > 0] <- "red"

heatmap.2(rppaUse[allProteins, ], scale = "row", trace = "none", labRow = allProteins, 
    labCol = "", col = bluered, ColSideColors = colVec, RowSideColors = rowVec, 
    xlab = "", cexRow = 1)

plot of chunk heatmapProteins

3.6 Scatterplots

Scatterplots are produced showing the relationship between FABP4 and ADH1B expression levels with each of the 36 proteins whose levels are significantly correlated with expression levels of one or both genes at P < 0.001. We begin with the 18 positively associated proteins, then show the plots for the 12 negatively associated proteins. Spearman correlation coefficients (Rho) are shown above each plot.


proteinsPos <- unique(c(fabp4ProteinsPos, adh1bProteinsPos))
proteinsNeg <- unique(c(fabp4ProteinsNeg, adh1bProteinsNeg))
proteinList <- c(proteinsPos, proteinsNeg)

par(mfrow = c(2, 2))

for (i1 in 1:length(proteinList)) {

    rho1 = round(1000 * fabp4CorRho[proteinList[i1]])/1000
    rho2 = round(1000 * adh1bCorRho[proteinList[i1]])/1000

    plot(fabp4, rppaUse[proteinList[i1], ], xlab = "Log2(FABP4 Expression)", 
        ylab = paste(proteinList[i1], " Levels by RPPA", sep = ""), main = paste("Rho = ", 
            rho1, sep = ""))

    plot(adh1b, rppaUse[proteinList[i1], ], xlab = "Log2(ADH1B Expression)", 
        ylab = paste(proteinList[i1], " Levels by RPPA", sep = ""), main = paste("Rho = ", 
            rho2, sep = ""))

}

plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots plot of chunk scatterPlots

3.7 Output

Data from proteins of interest is exported for pathway analysis.

We prepare additional lists for export in addition to proteins meeting the P < 0.001 correlation criterion.


fabp4Proteins01 <- names(fabp4CorP)[fabp4CorP < 0.01]
adh1bProteins01 <- names(adh1bCorP)[adh1bCorP < 0.01]

rppaFABP4001 <- rppaUse[fabp4Proteins, ]
rppaADH1B001 <- rppaUse[adh1bProteins, ]

rppaFABP401 <- rppaUse[fabp4Proteins01, ]
rppaADH1B01 <- rppaUse[adh1bProteins01, ]

corFABP401 <- fabp4CorRho[fabp4Proteins01]
corADH1B01 <- adh1bCorRho[adh1bProteins01]

write.table(fabp4, file = "FABP4.csv", sep = ",")
write.table(adh1b, file = "ADH1B.csv", sep = ",")

write.table(rppaFABP4001, file = "rppaFABP4001.csv", sep = ",")
write.table(rppaADH1B001, file = "rppaADH1B001.csv", sep = ",")

write.table(rppaFABP401, file = "rppaFABP401.csv", sep = ",")
write.table(rppaADH1B01, file = "rppaADH1B01.csv", sep = ",")

write.table(corFABP401, file = "corFABP401.csv", sep = ",")
write.table(corADH1B01, file = "corADH1B01.csv", sep = ",")

4 Appendix

4.1 File Location

getwd()
## [1] "/Users/slt/SLT WORKSPACE/EXEMPT/OVARIAN/Ovarian residual disease study 2012/RD manuscript/Web page for paper/Webpage"

4.2 SessionInfo

sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] gplots_2.12.1 knitr_1.5    
## 
## loaded via a namespace (and not attached):
## [1] bitops_1.0-6       caTools_1.16       evaluate_0.5.1    
## [4] formatR_0.10       gdata_2.13.2       gtools_3.2.1      
## [7] KernSmooth_2.23-10 stringr_0.6.2      tools_3.0.2

5 References

[1] Hennessy BT, Lu Y, Gonzalez-Angulo AM, Carey MS, Myhre S, Ju Z, Davies MA, Liu W, Coombes K, Meric-Bernstam F, Bedrosian I, McGahren M, Agarwal R, Zhang F, Overgaard J, Alsner J, Neve RM, Kuo W-L, Gray JW, Borresen-Dale A-L, Mills GB. A Technical Assessment of the Utility of Reverse Phase Protein Arrays for the Study of the Functional Proteome in Non-microdissected Human Breast Cancers. Clin Proteome, 6:129-51, 2010.