Table of contents

  1. Table of contents
  2. Methods
    1. Model
    2. The DeMixT algorithm for deconvolution
      1. Parameter estimation using iterated conditional modes (ICM)
      2. The GSCM approach to improve model identifiability
      3. Simulation study for the GSCM approach
        1. Library loading
        2. Data simulation
        3. Running DeMixT with/without GSCM
        4. Supplementary Figure 1
  3. Data analysis
    1. Installing DeMixT
    2. Summary statistics for performance evaluation
      1. Concordance correlation coefficient (CCC)
      2. Measure of reproducibility
    3. Mixed tissue microarray dataset
      1. Library loading
      2. Data generation and processing
      3. Supplementary Table 7
      4. Running DeMixT
      5. Results summary
        1. Supplementary Tables 3, 4 and Table 1
        2. Supplementary Figure 3
        3. Figure 2a
        4. Supplementary Figure 4
    4. Mixed cell line RNA-seq dataset
      1. Library loading
      2. Data generation and processing
      3. Supplementary Table 7
      4. Running DeMixT
      5. Results summary
        1. Supplementary Tables 5, 6 and Table 1
        2. Supplementary Figure 5
        3. Figure 2b
        4. Supplementary Figure 6
    5. Laser-capture microdissection (LCM) prostate cancer FFPE microarray dataset
      1. Library loading
      2. Data generation and processing
      3. Supplementary Table 7
      4. Running DeMixT
      5. Results summary
        1. Figure 3a
        2. Figure 3b and 3c
        3. Supplementary Figure 7
        4. Figure 3d
        5. Supplementary Figure 8
        6. Supplementary Figure 9
    6. TCGA HNSCC data
      1. Load data
      2. Phase I in our pipeline
      3. Phase II in our pipeline
      4. Phase III in our pipeline
      5. Results summary
        1. Figure 4a
        2. Figure 4b
        3. Figure 4c
        4. Figure 4d
        5. Supplementary Figure 11

Methods

Model

Let \(Y_{ig}\) be the observed expression levels of the raw measured data from clinically derived malignant tumor samples for gene \(g, g = 1, \cdots, G\) and sample \(i, i = 1, \cdots, S\). \(G\) denotes the total number of probes/genes and \(S\) denotes the number of samples. The observed expression levels for solid tumors can be modeled as a linear combination of raw expression levels from three components:
\[\begin{equation} {Y_{ig}} = \pi _{1,i}N_{1,ig} + \pi _{2,i}N_{2,ig} + (1 - \pi_{1,i} - \pi _{2,i}){T_{ig}} \label{eq:1} \end{equation}\]

Here \(N_{1,ig}\), \(N_{2,ig}\) and \({T_{ig}}\) are the unobserved raw expression levels from each of the three components. We call the two components for which we require reference samples the \(N_1\)-component and the \(N_2\)-component. We call the unknown component the T-component. We let \(\pi_{1,i}\) denote the proportion of the \(N_1\)-component, \(\pi_{2,i}\) denote the proportion of the \(N_2\)-component, and \(1 - \pi_{1,i} - \pi_{2,i}\) denote the proportion of the T-component. We assume that the mixing proportions of one specific sample remain the same across all genes.
Our model allows for one component to be unknown, and therefore does not require reference profiles from all components. A set of samples for \(N_{1,ig}\) and \(N_{2,ig}\), respectively, needs to be provided as input data. This three-component deconvolution model is applicable to the linear combination of any three components in any type of material. It can also be simplified to a two-component model, assuming there is just one \(N\)-component. For application in this paper, we consider tumor (\(T\)), stromal (\(N_1\)) and immune components (\(N_2\)) in an admixed sample (\(Y\)).
Following the convention that \(\log_2\)-transformed microarray gene expression data follow a normal distribution, we assume that the raw measures \(N_{1,ig} \sim LN({\mu _{{N_1}g}},\sigma _{{N_1}g}^2)\), \(N_{2,ig} \sim LN({\mu _{{N_2}g}},\sigma _{{N_2}g}^2)\) and \({T_{ig}} \sim LN({\mu _{Tg}}, \sigma _{Tg}^2)\), where LN denotes a \(\log_2\)-normal distribution and \(\sigma _{{N_1}g}^2\),\(\sigma _{{N_2}g}^2\),\(\sigma _{Tg}^2\) reflect the variations under \(\log_2\)-transformed data . Consequently, our model can be expressed as the convolution of the density function for three \(\log_2\)-normal distributions. Because there is no closed form of this convolution, we use numerical integration to evaluate the complete likelihood function (see the full likelihood in the Supplementary Materials).

The DeMixT algorithm for deconvolution

DeMixT estimates all distribution parameters and cellular proportions and reconstitutes the expression profiles for all three components for each gene and each sample, as shown in equation (1). The estimation procedure (summarized in Figure 1b) has two main steps as follows.
1. Obtain a set of parameters \(\{\pi_{1,i}, \pi_{2,i}\}_{i=1}^S\), \(\{\mu_T, \sigma_T\}_{g=1}^G\) to maximize the complete likelihood function, for which \(\{\mu_{N_{1,g}}, \sigma_{N_{1,g}}, \mu_{N_{2,g}}, \sigma_{N_{2,g}}\}_{g=1}^G\) were already estimated from the available unmatched samples of the \(N_1\) and \(N_2\) component tissues. This step is described in further details below in parameter estimation and the GSCM approach.
2. Reconstitute the expression profiles by searching each set of \(\{n_{1,ig},n_{2,ig}\}\) that maximizes the joint density of \(N_{1,ig}\), \(N_{2,ig}\) and \(T_{ig}\)
\[\begin{equation} \mathop {\arg \max }\limits_{{n_{1,ig}},{n_{2,ig}}} \phi (\frac{{{y_{ig}} - {{\hat \pi }_{1,i}}{n_{1,ig}} - {{\hat \pi }_{2,i}}{n_{2,ig}}}}{{1 - {{\hat \pi }_{1,i}} - {{\hat \pi }_{2,i}}}}\left| {{{\hat \mu }_{{T_g}}},{{\hat \sigma }_{{T_g}}}} \right.)\phi ({n_{1,ig}}\left| {{{\hat \mu }_{{N_{1g}}}},{{\hat \sigma }_{{N_{1g}}}}} \right.)\phi ({n_{2,ig}}\left| {{{\hat \mu }_{{N_{2g}}}},{{\hat \sigma }_{{N_{2g}}}}} \right.) \end{equation}\]

where \(\phi(.|\mu, \sigma^2)\) is a log2-normal distribution density with location parameter \(\mu\) and scale parameter \(\sigma\).
In step 2, we combined the golden section search method with successive parabolic interpolations to find the maximum of the joint density function with respect to \(n_{1,ig}\) and \(n_{2,ig}\) that are positively bounded and constrained by \(\hat \pi_{1,i} n_{1,ig} + \hat\pi_{2,i} n_{2,ig} \le y_{ig}\). The value of \(t_{ig}\) is solved as \({y_{ig}} - {{\hat \pi }_{1,i}}{n_{1,ig}} - {{\hat \pi }_{2,i}}{n_{2,ig}}\).

Parameter estimation using iterated conditional modes (ICM)

In step 1, the unknown parameters to be estimated can be divided into two groups: gene-wise parameters, \(\{\mu_T,\sigma_T\}_{g=1}^G\), and sample-wise parameters, \(\{\pi_1,\pi_2\}_{i=1}^S\). These two groups of parameters are conditionally independent (Figure 1b).
For each pair of gene-wise parameters, we have
\({\{ {\pi _1},{\pi _2}\} _i} {\perp\!\!\!\perp} {\{ {\pi _1},{\pi _2}\} _j} \mid {{{\{ {\mu _T},{\sigma _T}\} }_1}, \cdots ,{{\{ {\mu _T},{\sigma _T}\} }_G}}\), for all \(i \neq j \in \{1, \cdots, S\}\),
and similarly for each pair of sample-wise parameters, we have
\({\{ \mu _T,\sigma _T\} _i} {\perp\!\!\!\perp} {\{ \mu _T,\sigma _T\} _j}\mid {{{\{ \pi _1,\pi _2\}}_1}, \cdots ,{{\{ \pi _1,\pi _2\} }_S} }\), for all \(i \neq j \in \{1, \cdots, G\}\).
These relationships allow us to implement an optimization method, ICM, to iteratively derive the conditional modes of each pair of gene-wise or sample-wise parameters, conditional on the others . Here, \({\pi _1},{\pi _2}\) are constrained between \(0\) and \(1\), and \(\mu_T, \sigma_T\) are positively bounded. We combined a golden section search and successive parabolic interpolations to find a good local maximum in each step. As shown by Besag , for ICM, the complete likelihood never decreases at any iteration and the convergence to the local maximum is guaranteed. Our ICM implementation is described in Figure 5.

The GSCM approach to improve model identifiability

Due to the high dimension of the parameter search space, and often flat likelihood surfaces in certain regions of the true parameters (e.g., \(\mu_1\)=\(\mu_2\)) that will be encountered by ICM (Supplementary Figure 2), we have developed a GSCM approach (illustrated in Figure 1b) to focus on the hilly part of the likelihood space. This reduces the parameter search space and improves the accuracy and computational efficiency. Here, we describe our general strategy. As there are large variations in the number of genes that are differentially expressed across datasets, the actual cutoffs may be adjusted for a given dataset.
Stage 1
We first combine the \(N_1\) and \(N_2\) components and assume a two-component mixture instead of three. This allows us to quickly estimate \(\pi_T\).
a: We select a gene set containing genes with small standard deviations (\(<0.1\) or \(0.5\)) for both the \(N_1\) and \(N_2\) components. Among these genes, we further select genes with \(\overline {LN}_{1g} \approx \overline {LN}_{2g}\) (mean difference \(< 0.25\) or \(0.5\)), where the \(\overline {LN}\) is the sample mean for the log2-transformed data. Within this set, we further select genes with the largest sample standard deviations of \(Y_g\) (top 250), suggesting differential expression between \(T\) and \(N\).
b: We run DeMixT in the two-component setting to estimate \(\mu_{Tg}\), \(\sigma_{Tg}^2\) and \(\pi_T\).
Stage 2
We then fix the values of \(\{\pi_T\}_i\) as derived from Stage 1, and further estimate \(\{\pi_1\}_i\) and \(\{\pi_2\}_i\) in the three-component setting.
a: We select genes with the greatest difference in the mean expression levels between the \(N_1\) and \(N_2\) components as well as those with the largest sample standard deviations of \(Y_g\) (top 250).
b: We run DeMixT in the three-component setting over the selected genes to estimate \(\pi_1\) and \(\pi_2\) given \(\pi_T\).
c: We estimate the gene-wise parameters for all genes given the fixed \(\pi\)’s. Finally, given all parameters, per gene per sample expression level, \(n_{1,ig}\), \(n_{2,ig}\) and \(t_{ig}\) are reconstituted.

Simulation study for the GSCM approach

To demonstrate the utility of GSCM for parameter estimation, we simulated a dataset with expression levels from 500 genes and 90 samples, 20 of pure \(N_1\)-type, 20 of pure \(N_2\)-type and 50 mixed samples. For the 50 mixed samples, we generated their proportions for all three components \((\pi_1, \pi_2, \pi_T) \sim Dir(1,1,1)\), where \(Dir\) is a Dirichlet distribution. For each mixed sample, we simulated expression levels of \(500\) genes for the \(N_1\) and T-component from a \(\log_2\)-normal distribution with \(\mu_{N_{1g}}\) and \(\mu_{Tg}\) from \(N_{[0, +\infty]}(7, 1.5^2)\), and with equal variance. For the \(N_2\)-component, we generated \(\mu_{N_{2g}}\) from \(\mu_{N_{2g}} + d_g\), where \(d_g \sim N_{[-0.1,0.1]}(0,1.5^2)\) for \(475\) genes (\(\hat \mu_{N1g} \approx \hat \mu_{N2g}\) ) and \(d_g \sim N_{[0.1,3]}(0,1.5^2) \cup N_{[-3,-0.1]}(0,1.5^2)\) for \(25\) genes (\(\hat \mu_{N1g} \not\approx \hat \mu_{N2g}\)). Then we mixed the \(N_1\), \(N_2\) and T-component expression levels linearly at the generated proportions according to our convolution model. We created a full matrix consisting of \(20\) \(N_1\)-type reference samples (generated separately from the \(N_1\) distribution), \(20\) \(N_2\)-type reference samples (generated separately from the \(N_2\) distribution) and \(50\) mixed samples at each simulation and repeated the simulation \(100\) times for each of the three variance values \(\sigma \in \{0.1,0.3,0.5\}\) to finally obtain \(300\) simulation repeats. We first ran DeMixT with GSCM, where we used 475 genes with simulated \(\hat \mu_{N1g} \approx \hat \mu_{N2g}\) to run the two-component deconvolution (\(N\) versus \(T\)) and used the remaining \(25\) genes to run the three-component deconvolution with estimated \(\hat{\pi}_T\). We also ran DeMixT without GSCM using all 500 genes. RMSEs between estimated and true proportions for all mixed samples were calculated for each of the two runs: with and without GSCM.

Library loading

library(MCMCpack)
library(truncnorm)
library(hydroGOF)
library(ggplot2)

Data simulation

for (i in 1:3){
for(j in 1:100){
set.seed(i)
ngenes1=475; ngenes2=25; ngenes = ngenes1 + ngenes2
nT=50 # three components
nN1=20; nN2=20
nData<-matrix(0, ngenes, nN1+nN2+nT);
TData<-matrix(0, ngenes, nT);  ## True T
TPi <- t(rdirichlet(nT, alpha = c(1,1,1)))
row.names(TPi) <- 1:3
Nmu1=7; Nsigma1=1.5# stroma1
Tmu=7; Tsigma=1.5 # tumor
SW<-c(0.1,0.3,0.5)[i]
SWN1<-SW; SWN2<-SW; SWT<-SW; 
mun1<-rtruncnorm(ngenes, a=0,mean = Nmu1, sd=Nsigma1)
mun1<-rtruncnorm(ngenes, a=0,mean = Nmu1, sd=Nsigma1)
mun2_1st <- mun1[1:ngenes1] + rtruncnorm(1, a = -0.1, b = 0.1, mean = 0, sd = 1.5)
mun2_2nd <- c()
for(l in (ngenes1+1):(ngenes1+ngenes2)){
     tmp <- mun1[l] + rtruncnorm(1, a = 0.1, b = 3, mean = 0, sd = 1.5)*(-1)^rbinom(1,size=1,prob=0.5)
     while(tmp <= 0) tmp <- mun1[l] + rtruncnorm(1, a = 0.1, b = 3, mean = 0, sd = 1.5)*(-1)^rbinom(1,size=1,prob=0.5)
     mun2_2nd <- c(mun2_2nd, tmp)
}
mun2 <- c(mun2_1st, mun2_2nd)
mut<-rtruncnorm(ngenes, a=0,mean = Tmu, sd=Tsigma)

mu <- cbind(mun1, mun2, mut)

for(k in 1:ngenes)
{
   nData[k,1:nN1] = 2^(rtruncnorm(nN1, a=0,mean=mun1[k], sd=SWN1)); #stroma 1
   nData[k,(nN1+1):(nN1+nN2)] = 2^(rtruncnorm(nN2, a=0,mean=mun2[k], sd=SWN2)); #stroma 2
   TData[k,] =  2^rtruncnorm(nT, a=0,mean=mut[k], sd=SWT);
   nData[k,(nN1+nN2+1):(nN1+nN2+nT)] = TPi[1,]*2^(rtruncnorm(nT, a=0,mean=mun1[k], sd=SWN1)) + TPi[2,]*2^(rtruncnorm(nT, a=0,mean=mun2[k], sd=SWN2)) + TPi[3,]*TData[k,]
}
ngroup = c(rep(1, nN1), rep(2, nN2), rep(3, nT))
ngroup1 = c(rep(1, nN1+nN2), rep(3,nT))
save.image(file = paste0('sim',i,j,'.rda'))
}
}

Running DeMixT with/without GSCM

##with GSCM
for(ii in 1:3){
  for(jj in 1:100){
load(paste0('sim',ii,'.rda'))
nData1 <- nData[1:ngenes1, ]
nData2<- nData[(ngenes1+1):(ngenes1+ngenes2), ]
xres1 <- DeMixT.Kernel(inputdata=nData1, groupid=ngroup1, nhavepi=0, givenpi=rep(0,nT), givenpiT=rep(0,nT))
PiT = 1 - colSums(xres1$pi)
xres2 <- DeMixT.Kernel(inputdata=nData2, groupid=ngroup, nhavepi=2, givenpi=rep(0,nT*2), givenpiT=PiT, niter=30)
save(xres1, xres2, file=paste0('x',ii,jj,'.rda'))
##without GSCM
yres <- DeMixT.S1(nData, ngroup, niter = 30, if.filter= FALSE)
save(yres, file=paste0('y',ii,jj,'.rda'))
  }
}

Supplementary Figure 1

for(iii in 1:3){
rx1 = c(); ry1 = c(); 
rx2 = c(); ry2 = c();  
rx3 = c(); ry3 = c();
for(jjj in 1:100){
  load(paste0('sim',iii,jjj,'.rda'))
  load(paste0('x',iii,jjj,'.rda'))
  load(paste0('y',iii,jjj,'.rda'))
  rx1 = c(rx1, rmse(xres2$pi[1,], TPi[1,]))
  ry1 = c(ry1, rmse(yres$pi[1,], TPi[1,]))
  rx2 = c(rx2, rmse(xres2$pi[2,], TPi[2,]))
  ry2 = c(ry2, rmse(yres$pi[2,], TPi[2,]))
  rx3 = c(rx3, rmse(1 - colSums(xres2$pi), TPi[3,]))
  ry3 = c(ry3, rmse(1 - colSums(yres$pi), TPi[3,]))
}
save.image(file = "rmse",iii,".rda")
}
sig = c("0.1","0.3","0.5")
for(iv in 1:3){
plot.new()
load(paste0("rmse",iv,".rda"))
rx = c(rx1, rx2, rx3, ry1, ry2, ry3)
lab = c(rep(1,100),rep(2,100),rep(3,100),rep(1,100),rep(2,100),rep(3,100))
clr = c(rep("red",300), rep("blue",300))
dat = cbind(rx,lab,clr)
dat = as.data.frame(dat)
dat$rx = as.numeric(as.character(dat$rx))
dat$lab = as.factor(dat$lab)
dat$clr = as.factor(dat$clr)
g1<-ggplot(dat, aes(lab, rx, fill=clr)) +
  geom_dotplot(binaxis='y', stackdir='center',dotsize = 1,
               position=position_dodge(0.25)) + xlab("") +
  geom_segment(x = 0.7, xend = 1.05, y = median(rxby1), yend = median(rxby1), linetype="dashed", color = "red") + 
  geom_segment(x = 0.95, xend = 1.3, y = median(rxbx1), yend = median(rxbx1), linetype="dashed", color = "red") + 
  geom_segment(x = 1.7, xend = 2.05, y = median(rxby2), yend = median(rxby2), linetype="dashed", color = "red") + 
  geom_segment(x = 1.95, xend = 2.3, y = median(rxbx2), yend = median(rxbx2), linetype="dashed", color = "red") + 
  geom_segment(x = 2.7, xend = 3.05, y = median(rxby3), yend = median(rxby3), linetype="dashed", color = "red") + 
  geom_segment(x = 2.95, xend = 3.3, y = median(rxbx3), yend = median(rxbx3), linetype="dashed", color = "red") +
  ylab("RMSE") +
guides(fill=FALSE)+
scale_fill_manual(values=c("#17B6F6","#F617DB"))+
annotate(geom = "text", x = 1:3,label = c(paste("pi[1]"),paste("pi[2]"),paste("pi[T]")), y = -0.015, size = 12,parse=T) +
annotate(geom = "text", x = c(0.8,1.2,1.8,2.2,2.8,3.2), label = c("w/o CM","w/ CM","w/o CM","w/ CM","w/o CM","w/ CM"), y = -0.035, size = 10) +
coord_cartesian(ylim = c(0.0, 0.5), expand = FALSE) +
ggtitle(bquote(sigma[N[1]] ~" = " ~sigma[N[2]] ~" = " ~sigma[T] ~ " = " ~ .(sig[iv])))+
  theme(
      plot.margin = unit(c(1, 1, 4, 2), "lines"),
      panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
      panel.background = element_blank(), axis.line = element_line(colour = "black"),
      axis.title.y = element_text(face='bold',size=27,vjust=1),
      axis.text.y = element_text(face='bold',size=22,color='black'),
      axis.title.x = element_blank(),
      axis.text.x = element_blank(),
      plot.title = element_text(hjust = 0.5, size =32),
      )
g2 <- ggplot_gtable(ggplot_build(g1))
g2$layout$clip[g2$layout$name == "panel"] <- "off"
grid::grid.draw(g2)
}

Data analysis

All analyses were performed using the open-source environment R (http://cran.r-project.org). Documentation (knitr-html) of all scripts is provided at the GitHub repository.

Installing DeMixT

library(devtools)
devtools::install_github("wwylab/DeMixT/DeMixT_0.1")

Summary statistics for performance evaluation

Concordance correlation coefficient (CCC)

To evaluate the performance of our method, we use the CCC and RMSE. The CCC \(\rho_{xy}\) is a measure of agreement between two variables, \(x\) and \(y\), and is defined as \(\rho_{xy} = \frac{2\rho \sigma_x \sigma_y}{\sigma^2_x + \sigma^2_y + (\mu_x - \mu_y)^2}\), where \(\mu\) and \(\sigma^2\) are the corresponding mean and variance for each variable, and \(\rho\) is the correlation coefficient between the two variables. We calculate the CCC to compare the estimated and true proportions to evaluate the proportion estimation. We also calculate the CCC to compare the deconvolved and observed expression values (\(\log_2\)-transformed).

Measure of reproducibility

To assess the reproducibility of the estimated \(\pi\) across scenarios when the different components are unknown (i.e., three scenarios for a three-component model with one unknown component), we define a statistic \(R = \frac{1}{S} \sum_{i}^{S} (\frac{1}{K-1} \sum_{k}^{K} (\epsilon_i^k - \frac{1}{K} \sum_{k}^{K} \epsilon_i^k)^2)^{\frac{1}{2}}\), where \(\epsilon_i^k = \hat \pi_i^k - \pi_i\), \(\hat \pi_i^k\) is the estimated value for the \(k\)-th scenario and \(\pi_i\) is the truth for sample \(i\). \(S\) denotes the sample size and \(K\) is the number of scenarios. This measures the variations in the estimation errors across different scenarios. We consider a method with a smaller \(R\) as more reproducible and therefore more desirable.

Mixed tissue microarray dataset

We downloaded dataset GSE19830 from the GEO browser. We used the R package to summarize the raw probe intensities with quantile normalization but without background correction as recommended in previous studies . We evaluated the performance of DeMixT with regard to tissue proportions and deconvolved expression levels on the set of genes that were selected based on the GSCM approach. Specifically, we selected genes with sample standard deviation \(<0.1\) in \(N_1\) and \(N_2\) components, among which we used those with \(\overline {LN}_{1g} - \overline {LN}_{2g}<0.25\) for running the 2-compoment model, and used the top 250 genes with largest \(\overline {LN}_{1g} - \overline {LN}_{2g}\) and largest sample standard deviation in \(Y\) for running the 3-component model. Then we ran ISOpure for the purpose of comparison.

Library loading

library(limma)
library(preprocessCore)
library(affy)
library(DeMixT)
library("hydroGOF")
library('epiR')
library('memisc')
options(digits=3)

Data generation and processing

## read in microarrary data and do a RMA normalization
data <- ReadAffy()
eset <- expresso(data, normalize.method = "quantiles", bgcorrect.method = "none", pmcorrect.method ="pmonly",summary.method="medianpolish")
write.exprs(eset,file="GSE19830_original.txt")

Supplementary Table 7

rm(list=ls(all=TRUE))
gse = read.table("GSE19830_original.txt", header = F)
head = read.table("GSE19830_title.txt", sep = "\t", header = F)
head = as.character(t(as.matrix(head)))
head = strsplit(head, " / ")
note = matrix(unlist(head), nrow = 42, byrow = T)
mat = apply(note, 2, strsplit, split = " % ")
m1 = matrix(unlist(mat[[1]]), nrow = 42, byrow = T)[,1]
m2 = matrix(unlist(mat[[2]]), nrow = 42, byrow = T)[,1]
m3 = matrix(unlist(mat[[3]]), nrow = 42, byrow = T)[,1]
purity = cbind(m1,m2,m3)
colnames(purity) = c("Liver", "Brain", "Lung")
purity = as.data.frame(purity)
# Pure tissue 
liver = which(purity$Liver == "100"); brain = which(purity$Brain == "100"); lung = which(purity$Lung == "100")
save.image(file = 'gse19830.rda')
gsepur = gse[,c(liver,brain,lung)]
gseli = gse[,liver]; gsebr = gse[,brain]; gselu = gse[,lung]
barli = apply(gseli, 1, mean); barbr = apply(gsebr, 1, mean); barlu = apply(gselu, 1, mean)
conli0 = (barli > 7); conbr0 = (barbr > 7); conlu0 = (barlu > 7)
condli = conlu0&conbr0; condbr = conli0&conlu0; condlu = conli0&conbr0
condli_new = condli; condbr_new = condbr; condlu_new = condlu
save(condli_new, file = 'condli_new.RData')
save(condbr_new, file = 'condbr_new.RData')
save(condlu_new, file = 'condlu_new.RData')
# Mixed tissue
pur = purity[-c(liver,brain,lung),]
gsemix = gse[,-c(liver,brain,lung)]
gsemix = gsemix[,1:30]
pur = pur[-c(31,32,33),]; pur = apply(pur, 2, as.numeric); pur = pur/100
#three cases for deconvolution
#1 lung and brain known, liver unknown
lubr_li <- cbind(gselu, gsebr, gsemix)
write.table(lubr_li, "lubr_lid.txt", sep='\t',row.names = FALSE, col.names=FALSE);
write.table(gsepur[,1:3], "purTrueli.txt", sep='\t',row.names = FALSE, col.names=FALSE);
#2 lung and liver known, brain unknown
lilu_br <- cbind(gseli, gselu, gsemix)
write.table(lilu_br, "lilu_brd.txt", sep='\t',row.names = FALSE, col.names=FALSE);
write.table(gsepur[,4:6], "purTruebr.txt", sep='\t',row.names = FALSE, col.names=FALSE);
#3 liver and brain known, lung unknown
libr_lu <- cbind(gseli, gsebr, gsemix)  
write.table(libr_lu, "libr_lud.txt", sep='\t',row.names = FALSE, col.names=FALSE);
write.table(gsepur[,7:9], "purTruelu.txt", sep='\t',row.names = FALSE, col.names=FALSE);
load('gse19830.rda')
li = apply(gseli,1,mean); br = apply(gsebr,1,mean);lu = apply(gselu,1,mean)
cond1 = (li/br > 1.1)|(li/br < 1/1.1) 
cond2 = (li/lu > 1.1)|(li/lu < 1/1.1) 
cond3 = (lu/br > 1.1)|(lu/br < 1/1.1)
#1,2,3 for li,br,lu
#1 mu1 = mu2 = mu3
invisible(sum((!cond1)&(!cond2)&(!cond3))/31099) ## [1] 0.611
#2 mu1 != mu2 = mu3
invisible(sum((cond1)&(cond2)&(!cond3))/31099) ## [1] 0.0782
#3 mu1 = mu2 != mu3
invisible(sum((!cond1)&(cond2)&(cond3))/31099) ## [1] 0.0627
#4 mu1 != mu2 != mu3
invisible(sum((cond1)&(cond2)&(cond3))/31099) ## [1] 0.0405
invisible(sum((cond1)&(!cond2)&(!cond3))/31099) ## [1] 0.0397
invisible(sum((!cond1)&(cond2)&(!cond3))/31099) ## [1] 0.0269
invisible(sum((!cond1)&(!cond2)&(cond3))/31099) ## [1] 0.0352

Running DeMixT

run_id <- c('lubr_lid', 'lilu_brd', 'libr_lud')
for(i in 1:3){
    name <- run_id[i]
    inputmat <-as.matrix(read.table(paste(name, ".txt", sep=""), header=FALSE ))
    inputn1m <- apply(inputmat[,1:3], 1, mean)
    inputn2m <- apply(inputmat[,4:6], 1, mean)
    #keep genes with high expression for deconvolution after linearity checking
    inputmat1 <- 2^inputmat[((inputn1m > 7)&(inputn2m > 7)), ]
    groupid <-c(rep(1, 3), rep(2, 3), rep(3, 30)); # input as 2-component
    testr1 <- DeMixT.S1(inputmat1, groupid, niter = 15, ninteg = 60, filter.option = 1, filter.criteria1 = c(0.1,0.1), filter.criteria2 = c(250,200), filter.criteria3 = 0.25, if.filter = TRUE, sg0 = 0.01, mu0 = 7.0,tol = 10^-6)
    write.table(testr1$pi, paste(name,"_out2.txt",sep=""), col.names=FALSE, row.names=FALSE, sep="\t")
    givenpi <- c(t(testr1$pi))
    testr2 <- DeMixT.S2(inputmat, groupid, givenpi, ninteg = 60)
    write.table(testr2$decovExprT, paste(name,"_out_decov.txt",sep=""), col.names=FALSE, row.names=FALSE, sep="\t")
    write.table(testr2$decovMu[,3], paste(name,"_mu.txt",sep=""), col.names=FALSE, row.names=FALSE, sep="\t")
    write.table(testr2$decovSigma[,3], paste(name,"_sigma.txt",sep=""), col.names=FALSE, row.names=FALSE, sep="\t")
}

Results summary

Supplementary Tables 3, 4 and Table 1

load("pur.Rdata")
pur =as.matrix(pur); pur = apply(pur, 2, as.numeric); pur = pur/100
Tbr <- pur[,2]; Tli <- pur[,1]; Tlu <- pur[,3]
## brain unknown
lilu_brp <- read.table("lilu_brd_out2.txt", header = F)
br_li <- as.numeric(lilu_brp[1,]); br_lu <- as.numeric(lilu_brp[2,])
br_br <- 1 - br_li - br_lu
corbr <- epi.ccc(Tbr,br_br, ci = "z-transform", conf.level = 0.95)$rho.c$est
corli <- epi.ccc(Tli,br_li, ci = "z-transform", conf.level = 0.95)$rho.c$est
corlu <- epi.ccc(Tlu,br_lu, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(Tbr,br_br, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.879 0.796  0.93
print(epi.ccc(Tlu,br_lu, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.951 0.914 0.972
print(epi.ccc(Tli,br_li, ci = "z-transform", conf.level = 0.95)$rho.c)
##    est lower upper
## 1 0.74 0.615 0.829
print(round(corbr,2)); print(round(corlu,2)); print(round(corli,2))
## [1] 0.88
## [1] 0.95
## [1] 0.74
iso_br <- as.numeric(as.matrix(read.table("lilubr_alphapurities_new.txt", header = F)))
iso_li <- apply(as.matrix(read.table("lilubr_theta_new.txt", header = F))[,1:3], 1, sum)
iso_lu <- apply(as.matrix(read.table("lilubr_theta_new.txt", header = F))[,4:6], 1, sum)
# load results from ISOpure
iso_corbr <- epi.ccc(Tbr,iso_br, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corlu <- epi.ccc(Tlu,iso_lu, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corli <- epi.ccc(Tli,iso_li, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(Tbr,iso_br, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.681 0.544 0.783
print(epi.ccc(Tlu,iso_lu, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.996 0.993 0.998
print(epi.ccc(Tli,iso_li, ci = "z-transform", conf.level = 0.95)$rho.c)
##   est lower upper
## 1 0.7 0.557 0.803
print(round(iso_corbr,2)); print(round(iso_corlu,2)); print(round(iso_corli,2))
## [1] 0.68
## [1] 1
## [1] 0.7
print(round(rmse(Tbr,br_br),2)); print(round(rmse(Tlu,br_lu),2)); print(round(rmse(Tli,br_li),2))
## [1] 0.08
## [1] 0.06
## [1] 0.13
print(round(rmse(Tbr,iso_br),2)); print(round(rmse(Tlu,iso_lu),2)); print(round(rmse(Tli,iso_li),2))
## [1] 0.18
## [1] 0.02
## [1] 0.17
#calculate error for each value
#DeMixT
b.br <- br_br - Tbr; b.lu <- br_lu - Tlu; b.li <- br_li - Tli
#ISOpure
ib.br <- iso_br - Tbr; ib.lu <- iso_lu - Tlu; ib.li <- iso_li - Tli
## lung unknown
libr_lup <- read.table("libr_lud_out2.txt", header = F)
lu_li <- as.numeric(libr_lup[1,]); lu_br <- as.numeric(libr_lup[2,])
lu_lu <- 1 - lu_li - lu_br
corbr <- epi.ccc(Tbr,lu_br, ci = "z-transform", conf.level = 0.95)$rho.c$est
corli <- epi.ccc(Tli,lu_li, ci = "z-transform", conf.level = 0.95)$rho.c$est
corlu <- epi.ccc(Tlu,lu_lu, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(Tbr,lu_br, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.838 0.715  0.91
print(epi.ccc(Tlu,lu_lu, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.971 0.949 0.983
print(epi.ccc(Tli,lu_li, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.753 0.628  0.84
print(round(corbr,2)); print(round(corlu,2)); print(round(corli,2))
## [1] 0.84
## [1] 0.97
## [1] 0.75
iso_lu <- as.numeric(as.matrix(read.table("librlu_alphapurities_new.txt", header = F)))
iso_li <- apply(as.matrix(read.table("librlu_theta_new.txt", header = F))[,1:3], 1, sum)
iso_br <- apply(as.matrix(read.table("librlu_theta_new.txt", header = F))[,4:6], 1, sum)
# load results from ISOpure
iso_corbr <- epi.ccc(Tbr,iso_br, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corlu <- epi.ccc(Tlu,iso_lu, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corli <- epi.ccc(Tli,iso_li, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(Tbr,iso_br, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.971 0.941 0.986
print(epi.ccc(Tlu,iso_lu, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.736 0.611 0.825
print(epi.ccc(Tli,iso_li, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.838 0.748 0.898
print(round(iso_corbr,2)); print(round(iso_corlu,2)); print(round(iso_corli,2))
## [1] 0.97
## [1] 0.74
## [1] 0.84
print(round(rmse(Tbr,lu_br),2)); print(round(rmse(Tlu,lu_lu),2)); print(round(rmse(Tli,lu_li),2))
## [1] 0.1
## [1] 0.05
## [1] 0.13
print(round(rmse(Tbr,iso_br),2)); print(round(rmse(Tlu,iso_lu),2)); print(round(rmse(Tli,iso_li),2))
## [1] 0.04
## [1] 0.14
## [1] 0.11
# calculate error for each value
# DeMixT
u.br <- lu_br - Tbr; u.lu <- lu_lu - Tlu; u.li <- lu_li - Tli
# ISOpure
iu.br <- iso_br - Tbr; iu.lu <- iso_lu - Tlu; iu.li <- iso_li - Tli
## liver unknown
lubr_lip <- read.table("lubr_lid_out2.txt", header = F)
li_lu <- as.numeric(lubr_lip[1,]); li_br <- as.numeric(lubr_lip[2,])
li_li <- 1 - li_lu - li_br
corbr <- epi.ccc(Tbr,li_br, ci = "z-transform", conf.level = 0.95)$rho.c$est
corli <- epi.ccc(Tli,li_li, ci = "z-transform", conf.level = 0.95)$rho.c$est
corlu <- epi.ccc(Tlu,li_lu, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(Tbr,li_br, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.775 0.654 0.857
print(epi.ccc(Tlu,li_lu, ci = "z-transform", conf.level = 0.95)$rho.c)
##    est lower upper
## 1 0.96 0.941 0.974
print(epi.ccc(Tli,li_li, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.744  0.62 0.831
print(round(corbr,2)); print(round(corlu,2)); print(round(corli,2))
## [1] 0.77
## [1] 0.96
## [1] 0.74
# load results from ISOpure
iso_li <- as.numeric(as.matrix(read.table("lubrli_alphapurities_new.txt", header = F)))
iso_lu <- apply(as.matrix(read.table("lubrli_theta_new.txt", header = F))[,1:3], 1, sum)
iso_br <- apply(as.matrix(read.table("lubrli_theta_new.txt", header = F))[,4:6], 1, sum)
iso_corbr <- epi.ccc(Tbr,iso_br, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corlu <- epi.ccc(Tlu,iso_lu, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corli <- epi.ccc(Tli,iso_li, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(Tbr,iso_br, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.933 0.884 0.962
print(epi.ccc(Tlu,iso_lu, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.981 0.961 0.991
print(epi.ccc(Tli,iso_li, ci = "z-transform", conf.level = 0.95)$rho.c)
##    est lower upper
## 1 0.98 0.964 0.989
print(round(iso_corbr,2)); print(round(iso_corlu,2)); print(round(iso_corli,2))
## [1] 0.93
## [1] 0.98
## [1] 0.98
print(round(rmse(Tbr,li_br),2)); print(round(rmse(Tlu,li_lu),2)); print(round(rmse(Tli,li_li),2))
## [1] 0.12
## [1] 0.05
## [1] 0.13
print(round(rmse(Tbr,iso_br),2)); print(round(rmse(Tlu,iso_lu),2)); print(round(rmse(Tli,iso_li),2))
## [1] 0.07
## [1] 0.04
## [1] 0.04
#calculate error for each value
# DeMixT
i.br <- li_br - Tbr; i.lu <- li_lu - Tlu; i.li <- li_li - Tli
# ISOpure
ii.br <- iso_br - Tbr; ii.lu <- iso_lu - Tlu; ii.li <- iso_li - Tli
##we assume a summary statistics by calucalting sum(sd(error))/n
#DeMixT
dm.br <- cbind(b.br, u.br, i.br); dm.lu <- cbind(b.lu, u.lu, i.lu); dm.li <- cbind(b.li, u.li, i.li)
dm.brs <- apply(dm.br, 1, sd); dm.brs <- mean(dm.brs)
dm.lus <- apply(dm.lu, 1, sd); dm.lus <- mean(dm.lus)
dm.lis <- apply(dm.li, 1, sd); dm.lis <- mean(dm.lis)
print(c(dm.brs, dm.lus, dm.lis))
## [1] 0.0327 0.0324 0.0253
#ISOpure
iso.br <- cbind(ib.br, iu.br, ii.br); iso.lu <- cbind(ib.lu, iu.lu, ii.lu); iso.li <- cbind(ib.li, iu.li, ii.li)
iso.brs <- apply(iso.br, 1, sd); iso.brs <- mean(iso.brs)
iso.lus <- apply(iso.lu, 1, sd); iso.lus <- mean(iso.lus)
iso.lis <- apply(iso.li, 1, sd); iso.lis <- mean(iso.lis)
print(c(iso.brs, iso.lus, iso.lis))
## [1] 0.0961 0.0830 0.0664

Supplementary Figure 3

## lilu_br
lilu_brp <- read.table("lilu_brd_out2.txt", header = F)
br_li <- as.numeric(lilu_brp[1,]); br_lu <- as.numeric(lilu_brp[2,])
br_br <- 1 - br_li - br_lu
isobr_br <- as.numeric(as.matrix(read.table("lilubr_alphapurities_new.txt", header = F)))
isobr_li <- apply(as.matrix(read.table("lilubr_theta_new.txt", header = F))[,1:3], 1, sum)
isobr_lu <- apply(as.matrix(read.table("lilubr_theta_new.txt", header = F))[,4:6], 1, sum)
## libr_lu
libr_lup <- read.table("libr_lud_out2.txt", header = F)
lu_li <- as.numeric(libr_lup[1,]); lu_br <- as.numeric(libr_lup[2,])
lu_lu <- 1 - lu_li - lu_br
isolu_lu <- as.numeric(as.matrix(read.table("librlu_alphapurities_new.txt", header = F)))
isolu_li <- apply(as.matrix(read.table("librlu_theta_new.txt", header = F))[,1:3], 1, sum)
isolu_br <- apply(as.matrix(read.table("librlu_theta_new.txt", header = F))[,4:6], 1, sum)
## lubr_li
lubr_lip <- read.table("lubr_lid_out2.txt", header = F)
li_lu <- as.numeric(lubr_lip[1,]); li_br <- as.numeric(lubr_lip[2,])
li_li <- 1 - li_lu - li_br
isoli_li <- as.numeric(as.matrix(read.table("lubrli_alphapurities_new.txt", header = F)))
isoli_lu <- apply(as.matrix(read.table("lubrli_theta_new.txt", header = F))[,1:3], 1, sum)
isoli_br <- apply(as.matrix(read.table("lubrli_theta_new.txt", header = F))[,4:6], 1, sum)
########liver########
li_prop1 <- c(pur[,1], pur[,1], pur[,1], pur[,1], pur[,1], pur[,1])
li_prop2 <- c(li_li, lu_li, br_li, isoli_li, isolu_li, isobr_li)
li_prop <- data.frame(Truth = li_prop1, Estimates = li_prop2, 
Tool = c(rep("DeMixT(Liver Unknown)", 30), rep("DeMixT(Lung Unknown)", 30),rep("DeMixT(Brain Unknown)", 30),
rep("ISOpure(Liver Unknown)", 30),rep("ISOpure(Lung Unknown)", 30), rep("ISOpure(Brain Unknown)", 30)),
Tissue = c(rep("Liver Unknown", 30), rep("Lung Unknown", 30),rep("Brain Unknown", 30), 
rep("Liver Unknown", 30),rep("Lung Unknown", 30), rep("Brain Unknown", 30)),
Dec = c(rep("DeMixT", 90), rep("ISOpure", 90)))
########brain########
br_prop1 <- c(pur[,2], pur[,2], pur[,2], pur[,2], pur[,2], pur[,2])
br_prop2 <- c(li_br, lu_br, br_br, isoli_br, isolu_br, isobr_br)
br_prop <- data.frame(Truth = br_prop1, Estimates = br_prop2, 
Tool = c(rep("DeMixT(Liver Unknown)", 30), rep("DeMixT(Lung Unknown)", 30),rep("DeMixT(Brain Unknown)", 30),
rep("ISOpure(Liver Unknown)", 30),rep("ISOpure(Lung Unknown)", 30), rep("ISOpure(Brain Unknown)", 30)),
Tissue = c(rep("Liver Unknown", 30), rep("Lung Unknown", 30),rep("Brain Unknown", 30), 
rep("Liver Unknown", 30),rep("Lung Unknown", 30), rep("Brain Unknown", 30)),
Dec = c(rep("DeMixT", 90), rep("ISOpure", 90)))
########lung########
lu_prop1 <- c(pur[,3], pur[,3], pur[,3], pur[,3], pur[,3], pur[,3])
lu_prop2 <- c(li_lu, lu_lu, br_lu, isoli_lu, isolu_lu, isobr_lu)
lu_prop <- data.frame(Truth = lu_prop1, Estimates = lu_prop2, 
Tool = c(rep("DeMixT(Liver Unknown)", 30), rep("DeMixT(Lung Unknown)", 30),rep("DeMixT(Brain Unknown)", 30),
rep("ISOpure(Liver Unknown)", 30),rep("ISOpure(Lung Unknown)", 30), rep("ISOpure(Brain Unknown)", 30)),
Tissue = c(rep("Liver Unknown", 30), rep("Lung Unknown", 30),rep("Brain Unknown", 30), 
rep("Liver Unknown", 30),rep("Lung Unknown", 30), rep("Brain Unknown", 30)),
Dec = c(rep("DeMixT", 90), rep("ISOpure", 90)))
########################################making plot########################################
plot( jitter(lu_prop$Truth*100,1), lu_prop$Estimates*100, main = "Proportions for lung",
col = ifelse(lu_prop$Dec == "ISOpure", "black", "blue"),
pch = cases((lu_prop$Tissue == "Liver Unknown") -> 3, (lu_prop$Tissue == "Lung Unknown") -> 1,
(lu_prop$Tissue == "Brain Unknown") -> 2),
xlim = c(0,100), ylim=c(0,100), ylab = "Estimates (%)", xlab = "Truth (%)")
legend("topleft", c("ISOpure", "DeMixT"),text.col=c("black", "blue"), merge=FALSE, cex = 1.5,bty = "n")
legend("bottomright", c("liver unknown","lung unknown","brain unknown"), 
pch=c(3,1,2), merge=FALSE, cex = 1.5,bty = "n")
abline(0,1,col= "red",lwd = 3)
box(lwd=5)

plot( jitter(br_prop$Truth*100,1), br_prop$Estimates*100, main = "Proportions for brain",
col = ifelse(br_prop$Dec == "ISOpure", "black", "blue"),
pch = cases((br_prop$Tissue == "Liver Unknown") -> 3, (br_prop$Tissue == "Lung Unknown") -> 1,
(br_prop$Tissue == "Brain Unknown") -> 2),
xlim = c(0,100), ylim=c(0,100), ylab = "Estimates (%)", xlab = "Truth (%)")
legend("topleft", c("ISOpure", "DeMixT"),text.col=c("black", "blue"), merge = FALSE, cex = 1.5,bty = "n")
legend("bottomright", c("liver unknown","lung unknown","brain unknown"), 
pch=c(3,1,2), merge=FALSE, cex = 1.5,bty = "n")
abline(0,1,col= "red",lwd = 3)
box(lwd=5)

plot( jitter(li_prop$Truth*100,1), li_prop$Estimates*100, main = "Proportions for liver",
col = ifelse(li_prop$Dec == "ISOpure", "black", "blue"),
pch = cases((li_prop$Tissue == "Liver Unknown") -> 3, (li_prop$Tissue == "Lung Unknown") -> 1,
(li_prop$Tissue == "Brain Unknown") -> 2),
xlim = c(0,100), ylim=c(0,100), ylab = "Estimates (%)", xlab = "Truth (%)")
legend("topleft", c("ISOpure", "DeMixT"),text.col=c("black", "blue"), merge=FALSE, cex = 1.5,bty = "n")
legend("bottomright", c("liver unknown","lung unknown","brain unknown"), 
pch=c(3,1,2), merge=FALSE, cex = 1.5,bty = "n")
abline(0,1,col= "red",lwd = 3)
box(lwd=5)

Figure 2a

prop1 <- c(pur[,1], pur[,2], pur[,3], pur[,1], pur[,2], pur[,3])
prop2 <- c(li_li, br_br, lu_lu, isoli_li, isobr_br, isolu_lu)
prop <- data.frame(Truth = prop1, Estimates = prop2, 
Tool = c(rep("DeMixT(Liver Unknown)", 30), rep("DeMixT(Brain Unknown)", 30),rep("DeMixT(Lung Unknown)", 30),
rep("ISOpure(Liver Unknown)", 30),rep("ISOpure(Brain Unknown)", 30), rep("ISOpure(Lung Unknown)", 30)),
Tissue = c(rep("Liver Unknown", 30), rep("Brain Unknown", 30),rep("Lung Unknown", 30), 
rep("Liver Unknown", 30),rep("Brain Unknown", 30), rep("Lung Unknown", 30)),
Dec = c(rep("DeMixT", 90), rep("ISOpure", 90)))
plot( jitter(prop$Truth*100,1), prop$Estimates*100, main = "Proportions",
col = ifelse(prop$Dec == "ISOpure", "black", "blue"),
pch = cases((prop$Tissue == "Liver Unknown") -> 3, (prop$Tissue == "Lung Unknown") -> 1,
(prop$Tissue == "Brain Unknown") -> 2),
xlim = c(0,100), ylim=c(0,100), ylab = "Estimates (%)", xlab = "Truth (%)")
legend("topleft", c("ISOpure", "DeMixT"),text.col=c("black", "blue"), merge=FALSE, cex = 1.5,bty = "n")
legend("bottomright", c("liver unknown","lung unknown","brain unknown"), 
pch=c(3,1,2), merge=FALSE, cex = 1.5, bty = "n")
abline(0,1,col= "red", lwd=3)
box(lwd=5)

Supplementary Figure 4

##brain
brd <- as.matrix(read.table("lilu_br_sigma.txt", header = F))[,1]
br <- as.matrix(read.table("lilu_br_mu.txt", header = F))[,1]
##lung
lud <- as.matrix(read.table("libr_lu_sigma.txt", header = F))[,1]
lu <- as.matrix(read.table("libr_lu_mu.txt", header = F))[,1]
##liver
lid <- as.matrix(read.table("lubr_li_sigma.txt", header = F))[,1]
li <- as.matrix(read.table("lubr_li_mu.txt", header = F))[,1]
##a preprocessing step to remove genes with too large estimated variation, as considered inaccurate
condbr = (brd < 0.99); condlu = (lud < 0.99); condli = (lid < 0.99)
Tbr <- apply(read.table("purTruebr.txt", header = F),1,mean)
Tlu <- apply(read.table("purTruelu.txt", header = F),1,mean)
Tli <- apply(read.table("purTrueli.txt", header = F),1,mean)
Tsdbr <- apply(read.table("purTruebr.txt", header = F),1,sd)
Tsdlu <- apply(read.table("purTruelu.txt", header = F),1,sd)
Tsdli <- apply(read.table("purTrueli.txt", header = F),1,sd)
gbr <- log2(read.table("lilubr_cc_cancerprofiles_new.txt", header = F))
glu <- log2(read.table("librlu_cc_cancerprofiles_new.txt", header = F))
gli <- log2(read.table("lubrli_cc_cancerprofiles_new.txt", header = F))
sdbr <- apply(gbr, 1, sd); sdlu <- apply(glu, 1, sd); sdli <- apply(gli, 1, sd)
gbr <- apply(gbr, 1, mean); glu <- apply(glu, 1, mean); gli <- apply(gli, 1, mean)
load('condli_new.RData'); load('condlu_new.RData'); load('condbr_new.RData')
idcondli <- which(condli_new == TRUE)
idcondlu <- which(condlu_new == TRUE)
idcondbr <- which(condbr_new == TRUE)
condbr1 <- condbr[idcondbr]; condlu1 <- condlu[idcondlu]; condli1 <- condli[idcondli]
br <- br[condbr&condbr_new]; gbr <- gbr[condbr1]; Tbr <- Tbr[condbr&condbr_new]
lu <- lu[condlu&condlu_new]; glu <- glu[condlu1]; Tlu <- Tlu[condlu&condlu_new]
li <- li[condli&condli_new]; gli <- gli[condli1]; Tli <- Tli[condli&condli_new]
corglu = round(epi.ccc(Tlu,glu, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corlu = round(epi.ccc(Tlu,lu, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
cor2lu = round(epi.ccc(glu,lu, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corgli = round(epi.ccc(Tli,gli, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corli = round(epi.ccc(Tli,li, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
cor2li = round(epi.ccc(gli,li, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corgbr = round(epi.ccc(Tbr,gbr, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corbr = round(epi.ccc(Tbr,br, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
cor2br = round(epi.ccc(gbr,br, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
smoothScatter((br+gbr)/2, br-gbr, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. ISOpure (brain is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((br-gbr)~((br+gbr)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", cor2br, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((br+Tbr)/2, br-Tbr, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. Truth (brain is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((br-Tbr)~((br+Tbr)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corbr, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((Tbr+gbr)/2, gbr-Tbr, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="ISOpure vs. Truth (brain is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((gbr-Tbr)~((gbr+Tbr)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corgbr, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

#liver is unknown
smoothScatter((li+gli)/2, li-gli, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. ISOpure (liver is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((li-gli)~((li+gli)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", cor2li, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((li+Tli)/2, li-Tli, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. Truth (liver is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((li-Tli)~((li+Tli)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corli, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((Tli+gli)/2, gli-Tli, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="ISOpure vs. Truth (liver is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((gli-Tli)~((gli+Tli)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corgli, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

#lung is unknown
smoothScatter((lu+glu)/2, lu-glu, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. ISOpure (lung is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((lu-glu)~((lu+glu)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", cor2lu, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((lu+Tlu)/2, lu-Tlu, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. Truth (lung is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((lu-Tlu)~((lu+Tlu)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corlu, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((Tlu+glu)/2, glu-Tlu, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="ISOpure vs. Truth (lung is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((glu-Tlu)~((glu+Tlu)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corglu, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

Mixed cell line RNA-seq dataset

This dataset was generated in house by mixing RNAs from three cell lines at fixed proportions. We mapped raw reads generated from paired-end Illumina sequencing to the human reference genome build 37.2 from NCBI through TopHat (default parameters and supplying the -G option with the GTF annotation file downloaded from the NCBI genome browser). The mapped reads obtained from the TopHat output were cleaned by SAMtools to remove improperly mapped and duplicated reads. We then used Picard tools to sort the cleaned SAM files according to their reference sequence names and create an index for the reads. The gene-level expression was quantified by applying the R packages GenomicFeatures and GenomicRanges. We generated a reference table from the human reference genome hg19 and then used the function findOverlaps to count the number of reads mapped to each exon for all the samples. This count dataset was pre-processed by total count normalization, and genes that contained zero counts were removed. The pre-processed count data were used as input for DeMixT and ISOpure.
We performed the same GSCM step as in the analysis of mixed tissue microarray data.

Library loading

library(DeMixT)
library("hydroGOF")
library('epiR')
library('memisc')
options(digits=3)

Data generation and processing

scale.norm <- function(x){
    nS <- ncol(x);nG <- nrow(x);nT <- apply(x, 2, sum); mT <- mean(nT)
    y <- matrix(0, nrow = nG, ncol = nS)
    for (i in 1:nS) y[,i] <- x[, i]/nT[i]*mT
    return(y)
}
ENSG <- read.table("ensg_sort.txt", header = T)[,-1]
# 3 + 3 + 3 + 23
ENSG <- as.matrix(ENSG)
ENSG <- scale.norm(ENSG)
ENSG1 <- ENSG[, seq(1, 63, length.out = 32)]
ENSG2 <- ENSG[, seq(2, 64, length.out = 32)]
ENSGT <- (ENSG1 + ENSG2)/2
COMP1 <- ENSGT[,1:3]
COMP2 <- ENSGT[,4:6]
COMP3 <- ENSGT[,7:9]
MIX <- ENSGT[,10:32]
C1C2MIX <- cbind(COMP1, COMP2, MIX) 
C2C3MIX <- cbind(COMP2, COMP3, MIX)
C1C3MIX <- cbind(COMP1, COMP3, MIX)
#CASE C1C2
C1C2MIX <- apply(C1C2MIX, 2, as.numeric)
C3 <- COMP3[apply(C1C2MIX == 0, 1, sum)==0, ]
C1C2MIX <- C1C2MIX[apply(C1C2MIX == 0, 1, sum)==0, ]
#CASE C2C3
C2C3MIX <- apply(C2C3MIX, 2, as.numeric)
C1 <- COMP1[apply(C2C3MIX == 0, 1, sum)==0, ]
C2C3MIX <- C2C3MIX[apply(C2C3MIX == 0, 1, sum)==0, ]
#CASE C1C3
C1C3MIX <- apply(C1C3MIX, 2, as.numeric)
C2 <- COMP2[apply(C1C3MIX == 0, 1, sum)==0, ]
C1C3MIX <- C1C3MIX[apply(C1C3MIX == 0, 1, sum)==0, ]
inputn1m <- apply(log2(C1C2MIX[,1:3]), 1, mean)
inputn2m <- apply(log2(C1C2MIX[,4:6]), 1, mean)
inputn1m <- apply(log2(C2C3MIX[,1:3]), 1, mean)
inputn2m <- apply(log2(C2C3MIX[,4:6]), 1, mean)
inputn1m <- apply(log2(C1C3MIX[,1:3]), 1, mean)
inputn2m <- apply(log2(C1C3MIX[,4:6]), 1, mean)
write.table(log2(C1C2MIX), file = "C1C2_C3.txt", sep = '\t', col.names = F, row.names = F)
write.table(log2(C2C3MIX), file = "C2C3_C1.txt", sep = '\t', col.names = F, row.names = F)
write.table(log2(C1C3MIX), file = "C1C3_C2.txt", sep = '\t', col.names = F, row.names = F)
write.table(log2(C1), "purTrueC1.txt", sep='\t',row.names = FALSE, col.names=FALSE);
write.table(log2(C2), "purTrueC2.txt", sep='\t',row.names = FALSE, col.names=FALSE);
write.table(log2(C3), "purTrueC3.txt", sep='\t',row.names = FALSE, col.names=FALSE);
save.image(file = "celine.rda")

Supplementary Table 7

load('celine.rda')
ENSG <- ENSG[apply(ENSG == 0, 1, sum)==0,]
COMP1 <- log2(ENSG[,1:6]); COMP2 <- log2(ENSG[,7:12]); COMP3 <- log2(ENSG[,13:18])
aCOMP1 <- apply(2^COMP1, 1, mean); aCOMP2 <- apply(2^COMP2, 1, mean); aCOMP3 <- apply(2^COMP3, 1, mean)
COMP1 <- apply(COMP1, 1, mean); COMP2 <- apply(COMP2, 1, mean); COMP3 <- apply(COMP3, 1, mean)
cond1 = (COMP1/COMP2 >1.1)|(COMP1/COMP2 < 1/1.1)
cond2 = (COMP1/COMP3 > 1.1)|(COMP1/COMP3 < 1/1.1)
cond3 = (COMP2/COMP3> 1.1)|(COMP2/COMP3 < 1/1.1)
#1 mu1 = mu2 = mu3
invisible(sum((!cond1)&(!cond2)&(!cond3))/5715) ## [1] 0.263
#2 mu1 != mu2 = mu3
invisible(sum((cond1)&(cond2)&(!cond3))/5715) ## [1] 0.11
#3 mu1 = mu2 != mu3
invisible(sum((!cond1)&(cond2)&(cond3))/5715) ## [1] 0.171
#4 mu1 != mu2 != mu3
invisible(sum((cond1)&(cond2)&(cond3))/5715) ## [1] 0.215

Running DeMixT

library(DeMixT)
run_id <- c('C2C3_C1', 'C1C3_C2', 'C1C2_C3')
for(i in 1:3){
    name <- run_id[i]
    inputmat <- as.matrix(read.table(paste(name, ".txt", sep=""), header=FALSE ))
    inputn1m <- apply(inputmat[,1:6], 1, mean)
    inputn2m <- apply(inputmat[,7:12], 1, mean)
    inputmat1 <- 2^inputmat[((inputn1m > 7)&(inputn2m > 7)), ]
      groupid<-c(rep(1, 6), rep(2,6), rep(3, 46)); # input as 2-component
    testr1 <- DeMixT.S1(inputmat1, groupid, niter = 15, ninteg = 60, filter.option = 1, filter.criteria1 = c(0.5,0.5), filter.criteria2 = c(250,250), filter.criteria3 = 0.25, if.filter = TRUE, sg0 = 0.01, mu0 = 7.0,tol = 10^-6)
    write.table(testr1$pi, paste(name,"_out2.txt",sep=""), col.names=FALSE, row.names=FALSE, sep="\t")
    givenpi <- c(t(testr1$pi))
    testr2 <- DeMixT.S2(2^inputmat, groupid, givenpi, ninteg = 60)
    write.table(testr2$decovExprT, paste(name,"_out_decov.txt",sep=""), col.names=FALSE, row.names=FALSE, sep="\t")
    write.table(testr2$decovMu[,3], paste(name,"_mu.txt",sep=""), col.names=FALSE, row.names=FALSE, sep="\t")
    write.table(testr2$decovSigma[,3], paste(name,"_sigma.txt",sep=""), col.names=FALSE, row.names=FALSE, sep="\t")
}

Results summary

Supplementary Tables 5, 6 and Table 1

load("purT.Rdata") ## load true data
tpc1 <- purT[,1];tpc2 <- purT[,2];tpc3 <- purT[,3]
## c1c3_c2
#c2 unknown
c1c3_c2p <- read.table("C1C3_C2_out2.txt", header = F)
c2_c1 <- as.numeric(c1c3_c2p[1,]); c2_c3 <- as.numeric(c1c3_c2p[2,])
c2_c2 <- 1 - c2_c1 - c2_c3
corc2 <- epi.ccc(tpc2,c2_c2, ci = "z-transform", conf.level = 0.95)$rho.c$est
corc1 <- epi.ccc(tpc1,c2_c1, ci = "z-transform", conf.level = 0.95)$rho.c$est
corc3 <- epi.ccc(tpc3,c2_c3, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(tpc1,c2_c1, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.912 0.843 0.952
print(epi.ccc(tpc2,c2_c2, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.981  0.97 0.989
print(epi.ccc(tpc3,c2_c3, ci = "z-transform", conf.level = 0.95)$rho.c)
##      est  lower upper
## 1 0.0805 0.0158 0.145
print(round(corc1,2)); print(round(corc2,2)); print(round(corc3,2))
## [1] 0.91
## [1] 0.98
## [1] 0.08
iso_c2 <- as.numeric(as.matrix(read.table("c1c3_alphapurities.txt", header = F)))
iso_c1 <- apply(as.matrix(read.table("c1c3_theta.txt", header = F))[,1:3], 1, sum)
iso_c3 <- apply(as.matrix(read.table("c1c3_theta.txt", header = F))[,4:6], 1, sum)
iso_corc2 <- epi.ccc(tpc2,iso_c2, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corc3 <- epi.ccc(tpc3,iso_c3, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corc1 <- epi.ccc(tpc1,iso_c1, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(tpc1,iso_c1, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.506 0.328  0.65
print(epi.ccc(tpc2,iso_c2, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.452 0.275   0.6
print(epi.ccc(tpc3,iso_c3, ci = "z-transform", conf.level = 0.95)$rho.c)
##        est   lower  upper
## 1 -0.00982 -0.0295 0.0099
print(round(iso_corc1,2)); print(round(iso_corc2,2)); print(round(iso_corc3,2))
## [1] 0.51
## [1] 0.45
## [1] -0.01
print(round(rmse(tpc1,c2_c1),2)); print(round(rmse(tpc2,c2_c2),2)); print(round(rmse(tpc3,c2_c3),2))
## [1] 0.09
## [1] 0.04
## [1] 0.08
print(round(rmse(tpc1,iso_c1),2)); print(round(rmse(tpc2,iso_c2),2)); print(round(rmse(tpc3,iso_c3),2))
## [1] 0.34
## [1] 0.36
## [1] 0.03
#calculate error
# DeMixT
c2.c1 <- c2_c1 - tpc1; c2.c2 <- c2_c2 - tpc2; c2.c3 <- c2_c3 - tpc3
# ISOpure
ic2.c1 <- iso_c1 - tpc1; ic2.c2 <- iso_c2 - tpc2; ic2.c3 <- iso_c3 - tpc3
## c3 unknown
c1c2_c3p <- read.table("C1C2_C3_out2.txt", header = F)
c3_c1 <- as.numeric(c1c2_c3p[1,]); c3_c2 <- as.numeric(c1c2_c3p[2,])
c3_c3 <- 1 - c3_c1 - c3_c2
corc2 <- epi.ccc(tpc2,c3_c2, ci = "z-transform", conf.level = 0.95)$rho.c$est
corc1 <- epi.ccc(tpc1,c3_c1, ci = "z-transform", conf.level = 0.95)$rho.c$est
corc3 <- epi.ccc(tpc3,c3_c3, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(tpc1,c3_c1, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.351 0.164 0.513
print(epi.ccc(tpc2,c3_c2, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.796 0.622 0.896
print(epi.ccc(tpc3,c3_c3, ci = "z-transform", conf.level = 0.95)$rho.c)
##      est    lower  upper
## 1 0.0152 -0.00302 0.0334
print(round(corc1,2)); print(round(corc2,2)); print(round(corc3,2))
## [1] 0.35
## [1] 0.8
## [1] 0.02
iso_c3 <- as.numeric(as.matrix(read.table("c1c2_alphapurities.txt", header = F)))
iso_c1 <- apply(as.matrix(read.table("c1c2_theta.txt", header = F))[,1:3], 1, sum)
iso_c2 <- apply(as.matrix(read.table("c1c2_theta.txt", header = F))[,4:6], 1, sum)
iso_corc2 <- epi.ccc(tpc2,iso_c2, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corc3 <- epi.ccc(tpc3,iso_c3, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corc1 <- epi.ccc(tpc1,iso_c1, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(tpc1,iso_c1, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.624 0.447 0.754
print(epi.ccc(tpc2,iso_c2, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.413 0.216 0.578
print(epi.ccc(tpc3,iso_c3, ci = "z-transform", conf.level = 0.95)$rho.c)
##      est  lower  upper
## 1 0.0279 0.0102 0.0456
print(round(iso_corc1,2)); print(round(iso_corc2,2)); print(round(iso_corc3,2))
## [1] 0.62
## [1] 0.41
## [1] 0.03
print(round(rmse(tpc1,c3_c1),2)); print(round(rmse(tpc2,c3_c2),2)); print(round(rmse(tpc3,c3_c3),2))
## [1] 0.32
## [1] 0.12
## [1] 0.39
print(round(rmse(tpc1,iso_c1),2)); print(round(rmse(tpc2,iso_c2),2)); print(round(rmse(tpc3,iso_c3),2))
## [1] 0.26
## [1] 0.28
## [1] 0.54
## c1 unknown
c2c3_c1p <- read.table("C2C3_C1_out2.txt", header = F)
c1_c2 <- as.numeric(c2c3_c1p[1,]); c1_c3 <- as.numeric(c2c3_c1p[2,])
c1_c1 <- 1 - c1_c3 - c1_c2
corc2 <- epi.ccc(tpc2,c1_c2, ci = "z-transform", conf.level = 0.95)$rho.c$est
corc1 <- epi.ccc(tpc1,c1_c1, ci = "z-transform", conf.level = 0.95)$rho.c$est
corc3 <- epi.ccc(tpc3,c1_c3, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(tpc1,c1_c1, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.995 0.989 0.997
print(epi.ccc(tpc2,c1_c2, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.913 0.839 0.954
print(epi.ccc(tpc3,c1_c3, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.137 0.049 0.222
print(round(corc1,2)); print(round(corc2,2)); print(round(corc3,2))
## [1] 0.99
## [1] 0.91
## [1] 0.14
iso_c1 <- as.numeric(as.matrix(read.table("c2c3_alphapurities.txt", header = F)))
iso_c2 <- apply(as.matrix(read.table("c2c3_theta.txt", header = F))[,1:3], 1, sum)
iso_c3 <- apply(as.matrix(read.table("c2c3_theta.txt", header = F))[,4:6], 1, sum)
iso_corc2 <- epi.ccc(tpc2,iso_c2, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corc3 <- epi.ccc(tpc3,iso_c3, ci = "z-transform", conf.level = 0.95)$rho.c$est
iso_corc1 <- epi.ccc(tpc1,iso_c1, ci = "z-transform", conf.level = 0.95)$rho.c$est
print(epi.ccc(tpc1,iso_c1, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.506  0.31  0.66
print(epi.ccc(tpc2,iso_c2, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.542 0.346 0.692
print(epi.ccc(tpc3,iso_c3, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.262 0.134 0.381
print(round(iso_corc1,2)); print(round(iso_corc2,2)); print(round(iso_corc3,2))
## [1] 0.51
## [1] 0.54
## [1] 0.26
print(round(rmse(tpc1,c1_c1),2)); print(round(rmse(tpc2,c1_c2),2)); print(round(rmse(tpc3,c1_c3),2))
## [1] 0.02
## [1] 0.08
## [1] 0.09
print(round(rmse(tpc1,iso_c1),2)); print(round(rmse(tpc2,iso_c2),2)); print(round(rmse(tpc3,iso_c3),2))
## [1] 0.27
## [1] 0.25
## [1] 0.03
#calculate error
# DeMixT
c1.c1 <- c1_c1 - tpc1; c1.c2 <- c1_c2 - tpc2; c1.c3 <- c1_c3 - tpc3
# ISOpure
ic1.c1 <- iso_c1 - tpc1; ic1.c2 <- iso_c2 - tpc2; ic1.c3 <- iso_c3 - tpc3
##we assume a summary statistics by calucalting sum(sd(error))/n
# DeMixT
dm.c1 <- cbind(c1.c1, c2.c1); dm.c2 <- cbind(c1.c2, c2.c2); dm.c3 <- cbind(c1.c3, c2.c3)
dm.c1s <- apply(dm.c1, 1, sd); dm.c1s <- mean(dm.c1s)
dm.c2s <- apply(dm.c2, 1, sd); dm.c2s <- mean(dm.c2s)
dm.c3s <- apply(dm.c3, 1, sd); dm.c3s <- mean(dm.c3s)
print(c(dm.c1s, dm.c2s, dm.c3s))
## [1] 0.0540 0.0578 0.0197
# ISOpure
iso.c1 <- cbind(ic1.c1, ic2.c1); iso.c2 <- cbind(ic1.c2, ic2.c2); iso.c3 <- cbind(ic1.c3, ic2.c3)
iso.c1s <- apply(iso.c1, 1, sd); iso.c1s <- mean(iso.c1s)
iso.c2s <- apply(iso.c2, 1, sd); iso.c2s <- mean(iso.c2s)
iso.c3s <- apply(iso.c3, 1, sd); iso.c3s <- mean(iso.c3s)
print(c(iso.c1s, iso.c2s, iso.c3s))
## [1] 0.40496 0.40650 0.00221

Supplementary Figure 5

## c1c3_c2
c1c3_c2p <- read.table("C1C3_C2_out2.txt", header = F)
c2_c1 <- as.numeric(c1c3_c2p[1,]); c2_c3 <- as.numeric(c1c3_c2p[2,])
c2_c2 <- 1 - c2_c1 - c2_c3
isoc2_c2 <- as.numeric(as.matrix(read.table("c1c3_alphapurities.txt", header = F)))
isoc2_c1 <- apply(as.matrix(read.table("c1c3_theta.txt", header = F))[,1:3], 1, sum)
isoc2_c3 <- apply(as.matrix(read.table("c1c3_theta.txt", header = F))[,4:6], 1, sum)
## c1c2_c3
c1c2_c3p <- read.table("C1C2_C3_out2.txt", header = F)
c3_c1 <- as.numeric(c1c2_c3p[1,]); c3_c2 <- as.numeric(c1c2_c3p[2,])
c3_c3 <- 1 - c3_c1 - c3_c2
isoc3_c3 <- as.numeric(as.matrix(read.table("c1c2_alphapurities.txt", header = F)))
isoc3_c1 <- apply(as.matrix(read.table("c1c2_theta.txt", header = F))[,1:3], 1, sum)
isoc3_c2 <- apply(as.matrix(read.table("c1c2_theta.txt", header = F))[,4:6], 1, sum)
## c3c2_c1
c3c2_c1p <- read.table("C2C3_C1_out2.txt", header = F)
c1_c2 <- as.numeric(c3c2_c1p[1,]); c1_c3 <- as.numeric(c3c2_c1p[2,])
c1_c1 <- 1 - c1_c3 - c1_c2
isoc1_c1 <- as.numeric(as.matrix(read.table("c2c3_alphapurities.txt", header = F)))
isoc1_c2 <- apply(as.matrix(read.table("c2c3_theta.txt", header = F))[,1:3], 1, sum)
isoc1_c3 <- apply(as.matrix(read.table("c2c3_theta.txt", header = F))[,4:6], 1, sum)
pur = purT
########c2########
c2_prop1 <- c(pur[,2], pur[,2], pur[,2], pur[,2])
c2_prop2 <- c(c1_c2, c2_c2, isoc1_c2, isoc2_c2)
c2_prop <- data.frame(Truth = c2_prop1, Estimates = c2_prop2, 
                      Dec = c(rep("DeMixT", 46), rep("ISOpure", 46)),
                      Tissue = c(rep("lung tumor unknown", 23), rep("fibroblast unknown", 23), rep("lung tumor unknown", 23), rep("fibroblast unknown", 23)))
########c1########
c1_prop1 <- c(pur[,1], pur[,1], pur[,1], pur[,1])
c1_prop2 <- c(c1_c1, c2_c1, isoc1_c1, isoc2_c1)
c1_prop <- data.frame(Truth = c1_prop1, Estimates = c1_prop2, 
                      Dec = c(rep("DeMixT", 46), rep("ISOpure", 46)),
                      Tissue = c(rep("lung tumor unknown", 23), rep("fibroblast unknown", 23), rep("lung tumor unknown", 23), rep("fibroblast unknown", 23)))
########c3########
c3_prop1 <- c(pur[,3], pur[,3], pur[,3], pur[,3])
c3_prop2 <- c(c1_c3, c2_c3, isoc1_c3, isoc2_c3)
c3_prop <- data.frame(Truth = c3_prop1, Estimates = c3_prop2, 
                      Dec = c(rep("DeMixT", 46), rep("ISOpure", 46)),
                      Tissue = c(rep("lung tumor unknown", 23), rep("fibroblast unknown", 23), rep("lung tumor unknown", 23), rep("fibroblast unknown", 23)))
###################################################################################################
plot( jitter(c1_prop$Truth*100,10), c1_prop$Estimates*100, main = "Proportions for tumor",
      col = ifelse(c1_prop$Dec == "ISOpure", "black", "blue"),
      pch = cases((c1_prop$Tissue == "lung tumor unknown") -> 3, (c1_prop$Tissue == "fibroblast unknown") -> 1),
      xlim = c(0,100), ylim=c(0,100), ylab = "Estimates (%)", xlab = "Truth (%)")
legend("bottomright", c("lung tumor unknown","fibroblast unknown"), pch=c(3,1), merge=FALSE, cex = 1.5, bty = "n")
legend("topleft", c("ISOpure", "DeMixT"),text.col=c("black", "blue"), merge=FALSE, cex = 1.5,bty = "n")
abline(0,1,col= "red",lwd = 3)
box(lwd=5)

plot(jitter(c2_prop$Truth*100,10), c2_prop$Estimates*100, main = "Proportions for fibroblast",
     col = ifelse(c1_prop$Dec == "ISOpure", "black", "blue"),
     pch = cases((c1_prop$Tissue == "lung tumor unknown") -> 3, (c1_prop$Tissue == "fibroblast unknown") -> 1),
     xlim = c(0,100), ylim=c(0,100), ylab = "Estimates (%)", xlab = "Truth (%)")
legend("bottomright", c("lung tumor unknown","fibroblast unknown"), pch=c(3,1), merge=FALSE, cex = 1.5, bty = "n")
legend("topleft", c("ISOpure", "DeMixT"),text.col=c("black", "blue"), merge=FALSE, cex = 1.5,bty = "n")
abline(0,1,col= "red",lwd = 3)
box(lwd=5)

plot( jitter(c3_prop$Truth*100,10), c3_prop$Estimates*100, main = "Proportions for immune",
      col = ifelse(c1_prop$Dec == "ISOpure", "black", "blue"),
      pch = cases((c1_prop$Tissue == "lung tumor unknown") -> 3, (c1_prop$Tissue == "fibroblast unknown") -> 1),
      xlim = c(0,100), ylim=c(0,100), ylab = "Estimates (%)", xlab = "Truth (%)")
legend("bottomright", c("lung tumor unknown","fibroblast unknown"), pch=c(3,1), merge=FALSE, cex = 1.5, bty = "n")
legend("topleft", c("ISOpure", "DeMixT"),text.col=c("black", "blue"), merge=FALSE, cex = 1.5,bty = "n")
abline(0,1,col= "red",lwd = 3)
box(lwd=5)

Figure 2b

prop1 <- c(pur[,1], pur[,2], pur[,1], pur[,2])
prop2 <- c(c1_c1, c2_c2, isoc1_c1, isoc2_c2)
prop <- data.frame(Truth = prop1, Estimates = prop2, 
                   Dec = c(rep("DeMixT", 46), rep("ISOpure", 46)),
                   Tissue = c(rep("lung tumor unknown", 23), rep("fibroblast unknown", 23), rep("lung tumor unknown", 23), rep("fibroblast unknown", 23)))
###################################################################################################
plot( jitter(prop$Truth*100,10), prop$Estimates*100, main = 'Proportions',
      col = ifelse(prop$Dec == "ISOpure", "black", "blue"),
      pch = cases((prop$Tissue == "lung tumor unknown") -> 3, (prop$Tissue == "fibroblast unknown") -> 1),
      xlim = c(0,100), ylim=c(0,100), ylab = "Estimates (%)", xlab = "Truth (%)")
legend("bottomright", c("lung tumor unknown","fibroblast unknown"), pch=c(3,1), merge=FALSE, cex = 1.5, bty = "n")
legend("topleft", c("ISOpure", "DeMixT"),text.col=c("black", "blue"), merge=FALSE, cex = 1.5,bty = "n")
abline(0,1,col= "red",lwd = 3)
box(lwd=5)

Supplementary Figure 6

C2d <- as.matrix(read.table("C1C3_C2_sigma.txt", header = F))[,1]
C1d <- as.matrix(read.table("C2C3_C1_sigma.txt", header = F))[,1]
C2 <- as.matrix(read.table("C1C3_C2_mu.txt", header = F))[,1]
C1 <- as.matrix(read.table("C2C3_C1_mu.txt", header = F))[,1]
##a preprocessing step to remove genes with too large estimated variation, as considered inaccurate
condC2 = (C2d < 0.99); condC1 = (C1d < 0.99)
TC2 <- apply(read.table("purTrueC2.txt", header = F),1,mean)
TC1 <- apply(read.table("purTrueC1.txt", header = F),1,mean)
gC2 <- log2(read.table("c1c3_cc_cancerprofiles.txt", header = F)) # C2 c2
gC1 <- log2(read.table("c2c3_cc_cancerprofiles.txt", header = F)) # C1 c1
sdC2 <- apply(gC2, 1, sd); sdC1 <- apply(gC1, 1, sd)
gC2 <- apply(gC2, 1, mean); gC1 <- apply(gC1, 1, mean)
id_gC2 = is.finite(TC2); id_gC1 = is.finite(TC1)
condC2 = condC2[id_gC2]; C2 = C2[id_gC2]; gC2 = gC2[id_gC2]
TC2 = TC2[id_gC2]; C2d = C2d[id_gC2]; sdC2 = sdC2[id_gC2]
condC1 = condC1[id_gC1]; C1 = C1[id_gC1]; gC1 = gC1[id_gC1]
TC1 = TC1[id_gC1]; C1d = C1d[id_gC1]; sdC1 = sdC1[id_gC1]
#Fibroblast#
## ISOpure
print(epi.ccc(TC2,gC2, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.928 0.925 0.931
print(rmse(TC2,gC2))
## [1] 1.05
## DeMixT
print(epi.ccc(TC2,C2, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.911 0.907 0.915
print(rmse(TC2,C2))
## [1] 1.21
#Tumor#
## ISOpure
print(epi.ccc(TC1,gC1, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.955 0.953 0.957
print(rmse(TC1,gC1))
## [1] 0.797
## DeMixT
print(epi.ccc(TC1,C1, ci = "z-transform", conf.level = 0.95)$rho.c)
##     est lower upper
## 1 0.896 0.891   0.9
print(rmse(TC1,C1))
## [1] 1.25
corgC1 = round(epi.ccc(TC1,gC1, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corC1 = round(epi.ccc(TC1,C1, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
cor2C1 = round(epi.ccc(C1,gC1, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corgC2 = round(epi.ccc(TC2,gC2, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corC2 = round(epi.ccc(TC2,C2, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
cor2C2 = round(epi.ccc(C2,gC2, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
C2 <- C2[condC2]; gC2 <- gC2[condC2]; TC2 <- TC2[condC2]
C1 <- C1[condC1]; gC1 <- gC1[condC1]; TC1 <- TC1[condC1]
#fibroblast is unknown
smoothScatter((C2+gC2)/2, C2-gC2, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
              main="DeMixT vs. ISOpure (fibroblast is unknown)", pch = 20, nrpoints = 0, colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((C2-gC2)~((C2+gC2)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", cor2C2, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((C2+TC2)/2, C2-TC2, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
              main="DeMixT vs. Truth (fibroblast is unknown)", pch = 20, nrpoints = 0, colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((C2-TC2)~((C2+TC2)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corC2, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((TC2+gC2)/2, gC2-TC2, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
              main="ISOpure vs. Truth (fibroblast is unknown)", pch = 20, nrpoints = 0, colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((gC2-TC2)~((gC2+TC2)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corgC2, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

#tumor is unknown
smoothScatter((C1+gC1)/2, C1-gC1, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
              main="DeMixT vs. ISOpure (lung tumor is unknown)", pch = 20, nrpoints = 0, colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((C1-gC1)~((C1+gC1)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", cor2C1, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((C1+TC1)/2, C1-TC1, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
              main="DeMixT vs. Truth (lung tumor is unknown)", pch = 20, nrpoints = 0, colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((C1-TC1)~((C1+TC1)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corC1, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

smoothScatter((TC1+gC1)/2, gC1-TC1, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
              main="ISOpure vs. Truth (lung tumor is unknown)", pch = 20, nrpoints = 0, colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((gC1-TC1)~((gC1+TC1)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corgC1, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")
box(lwd=5)

Laser-capture microdissection (LCM) prostate cancer FFPE microarray dataset

This dataset was generated at the Dana Farber Cancer Institute (GSE97284 ). Radical prostatectomy specimens were annotated in detail by pathologists, and regions of interest were identified that corresponded to benign epithelium, prostatic intraepithelial neoplasia (abnormal tissue that is possibly precancerous), and tumor, each with its surrounding stroma. These regions were laser-capture microdissected using the ArcturusXT system (Life Technologies). Additional areas of admixed tumor and adjacent stromal tissue were taken. RNA was extracted by AllPrep (Qiagen) and quantified by RiboGreen assay (Life Technology). RNA labeling was performed using the SensationPlus FFPE method (Affymetrix) and hybridized to Affymetrix Gene Array STA 1.0.
FFPE samples are known to generate overall lower quality expression data than those from fresh frozen samples. We observed a small proportion of probesets that presented large differences in mean expression levels between the dissected tissues: tumor (\(T\)) and stroma (\(N\)) in this dataset (Supplementary Table 7). Only \(53\) probesets presented a mean difference (\(|\overline T - \overline N|\)) \(>1\), as compared to \(10,397\) probesets in GSE19830. We therefore chose the top 80 genes with the largest mean differences and ran both DeMixT and ISOpure under two settings: tumor unknown and stroma unknown.

Library loading

library(limma)
library(preprocessCore)
library(affy)
library(DeMixT)
library("hydroGOF")
library('epiR')
library('memisc')
library('Biobase')
library(colorspace)
library(ggplot2)
library("grid")
library("gtable")
library("klaR")
options(digits=3)

Data generation and processing

## read in microarray data and do RMA normalization without background checking
data <- ReadAffy()
eset <- expresso(data, normalize.method = "quantiles", bgcorrect.method = "none", pmcorrect.method ="pmonly",
summary.method="medianpolish")
write.exprs(eset,file="lcm.txt")
##read in data for meta information
load("LCM_data_T_sT_AM_for_deconvolution_analysis.rda")
meta <- pData(data); ROI = pData(data)$ROI
Tumor_Name <- rownames(meta[meta$ROI == 'T', ])
Normal_Name <- rownames(meta[meta$ROI == 'sT', ])
Adm_Name <- rownames(meta[meta$ROI == 'AM', ])
mat <- read.table('lcm.txt', header = T)
for(i in 1:188) colnames(mat)[i] = strsplit(colnames(mat)[i], split = 'X')[[1]][2]
Tumor <- mat[, colnames(mat) %in% Tumor_Name]
Normal <- mat[, colnames(mat) %in% Normal_Name]
Admix <- mat[, colnames(mat) %in% Adm_Name]
tid = colnames(Tumor)
nid = colnames(Normal)
mid = colnames(Admix)
tMean <- apply(Tumor, 1, mean); nMean <- apply(Normal, 1, mean); aMean <- apply(Admix, 1, mean)
invisible(summary(abs(tMean - nMean)))
invisible(sum(abs(tMean - nMean) > 0.90)); id = (abs(tMean - nMean) > 0.90); n = nrow(Tumor)
save(id, file= 'lcm.id.RData')
pr_id = rownames(data)[id]
save(pr_id, file= 'prob_id_lcm.RData')
Am = Admix[id, ] #23; 
Nm = Normal[id, ] #25;
Tm = Tumor[id, ] #25
input <- cbind(Nm, Am, Tm)
write.table(input, file = "input.lcm.txt", sep = '\t', col.names= F, row.names = F)
write.table(2^Nm, file = "lcm_normal.txt", sep = '\t', col.names = F, row.names = F)
write.table(2^Am, file = "lcm_adm.txt", sep = '\t', col.names = F, row.names = F)
write.table(2^Tm, file = "lcm_tumor.txt", sep = '\t', col.names = F, row.names = F)

Supplementary Table 7

invisible(sum((tMean/nMean >1.1)|(tMean/nMean < 1/1.1))/n) ## [1] 0.00597
invisible(sum(!((tMean/nMean >1.1)|(tMean/nMean < 1/1.1)))/n) ## [1] 0.994

Running DeMixT

normal <- as.matrix(read.table("input.lcm.txt", header=FALSE ))[, 1:25]
adm <- as.matrix(read.table("input.lcm.txt", header=FALSE ))[, 26:48]
tumor <- as.matrix(read.table("input.lcm.txt", header=FALSE ))[, 49:73]
inputmat <- cbind(tumor, adm) 
cnvgroup<-c(rep(1, 25), rep(3, 23)); # input as 2-component
inputmat <- 2^inputmat
testr <- DeMixT(inputmat, cnvgroup, niter = 20, ninteg1 = 60, ninteg2 = 60, filter.out = FALSE, sg0 = 0.01, mu0 = 7.0,tol = 10^-6)
write.table(testr$decovExprT, "lcm_TA_out_decov.txt", col.names=FALSE, row.names=FALSE, sep="\t")
write.table(testr$decovMu[,2], "lcm_TA_mu.txt", col.names=FALSE, row.names=FALSE, sep="\t")
write.table(testr$decovSigma[,2], "lcm_TA_sigma.txt", col.names=FALSE, row.names=FALSE, sep="\t")
write.table(testr$pi, "lcm_TA_out.txt", col.names=FALSE, row.names=FALSE, sep="\t")
inputmat <- cbind(normal, adm) 
cnvgroup<-c(rep(1, 25), rep(3, 23)); # input as 2-component
inputmat <- 2^inputmat
testr <- DeMixT(inputmat, cnvgroup, niter = 20, ninteg1 = 60, ninteg2 = 60, filter.out = FALSE, sg0 = 0.01, mu0 = 7.0,tol = 10^-6)
write.table(testr$decovExprT, "lcm_SA_out_decov.txt", col.names=FALSE, row.names=FALSE, sep="\t")
write.table(testr$decovMu[,2], "lcm_SA_mu.txt", col.names=FALSE, row.names=FALSE, sep="\t")
write.table(testr$decovSigma[,2], "lcm_SA_sigma.txt", col.names=FALSE, row.names=FALSE, sep="\t")
write.table(testr$pi, "lcm_SA_out.txt", col.names=FALSE, row.names=FALSE, sep="\t")

Results summary

Figure 3a

dt_purT <- 1-as.numeric(read.table('lcm_SA_out.txt', header = F)[1,])
dt_purS <- 1-as.numeric(read.table('lcm_TA_out.txt', header = F)[1,])
iso_purT <- as.numeric(read.table('lcm_SA_alphapurities.txt', header = F)[,1])
iso_purS <- as.numeric(read.table('lcm_TA_alphapurities.txt', header = F)[,1])
bar_purS <- c(dt_purS, iso_purS); bar_purT <- c(dt_purT, iso_purT)
corDT <- round(epi.ccc(dt_purT, 1-dt_purS, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corISO <- round(epi.ccc(iso_purT, 1-iso_purS, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
plot(1 - bar_purS, bar_purT, col = c(rep('blue',23), rep('black', 23)),
pch = c(rep(1,23), rep(2, 23)), 
xlim = c(0,1), ylim=c(0,1), xlab = expression(1 - hat(pi)[S]), ylab = expression(hat(pi)[T]))
legend("topleft", c(paste("ISOpure (ccc = ", corISO, ")", sep = ''), paste("DeMixT (ccc = ", corDT, ")", sep = '')),pch = c(2,1),text.col=c("black", "blue"), col=c("black", "blue"),merge=FALSE, cex = 1.5,bty = "n")
abline(0,1,col= "red",lwd = 3)
box(lwd=5)

Figure 3b and 3c

OB_St <- log2(read.table("lcm_normal.txt", header = F))
OB_Tu <- log2(read.table("lcm_tumor.txt", header = F))
ISO_Tu <- log2(read.table("lcm_SA_cc_cancerprofiles.txt", header = F))
ISO_St <- log2(read.table("lcm_TA_cc_cancerprofiles.txt", header = F))
DT_Tu_mu <- as.numeric(as.matrix(read.table("lcm_SA_mu.txt", header = F))[,1])
DT_St_mu <- as.numeric(as.matrix(read.table("lcm_TA_mu.txt", header = F))[,1])
DT_Tu_sg <- as.numeric(as.matrix(read.table("lcm_SA_sigma.txt", header = F))[,1])
DT_St_sg <- as.numeric(as.matrix(read.table("lcm_TA_sigma.txt", header = F))[,1])
OB_St_m <- apply(OB_St, 1, mean); OB_Tu_m <- apply(OB_Tu, 1, mean)
ISO_St_m <- apply(ISO_St, 1, mean); ISO_Tu_m <- apply(ISO_Tu, 1, mean)
##a preprocessing step to remove genes with too large estimated variation, as considered inaccurate
condSt <- (DT_St_sg < 0.99); condTu <- (DT_Tu_sg < 0.99) 
DT_Tu_m <- apply(log2(read.table("lcm_SA_out_decov.txt", header = F)), 1, mean)
DT_St_m <- apply(log2(read.table("lcm_TA_out_decov.txt", header = F)), 1, mean)
OB_St_m <- OB_St_m[condSt]; OB_Tu_m <- OB_Tu_m[condTu]
ISO_St_m <- ISO_St_m[condSt]; ISO_Tu_m <- ISO_Tu_m[condTu]
DT_St_m <- DT_St_m[condSt]; DT_Tu_m <- DT_Tu_m[condTu]
corDT_St_m <- round(epi.ccc(OB_St_m, DT_St_m, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corISO_St_m <- round(epi.ccc(OB_St_m, ISO_St_m, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
cor2St_m <- round(epi.ccc(DT_St_m, ISO_St_m, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corDT_Tu_m <- round(epi.ccc(OB_Tu_m, DT_Tu_m, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
corISO_Tu_m <- round(epi.ccc(OB_Tu_m, ISO_Tu_m, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
cor2Tu_m <- round(epi.ccc(DT_Tu_m, ISO_Tu_m, ci = "z-transform", conf.level = 0.95)$rho.c$est, 2)
#Tumor is Unknown
smoothScatter((DT_Tu_m+OB_Tu_m)/2, DT_Tu_m-OB_Tu_m, ylab="Estimate - Truth", xlab="(Estimate + Truth)/2", xlim=c(2,16), ylim=c(-1.2,1.2), main="Mean expressions for Tumor", pch = 1, nrpoints = 0, col = 'yellow', colramp = colorRampPalette(c("white", "yellow", "yellow1","orange", "orange1")))
tmp01=lowess((DT_Tu_m-OB_Tu_m)~((DT_Tu_m+OB_Tu_m)/2))
lines(tmp01$x,tmp01$y,col="blue", lwd = 5)
tmp01=lowess((ISO_Tu_m-OB_Tu_m)~((ISO_Tu_m+OB_Tu_m)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("bottomright", c(paste("ISOpure (ccc = ", corISO_Tu_m, ")", sep = ''), paste("DeMixT (ccc = ", corDT_Tu_m, ")", sep = '')), text.col=c("black", "blue"), lty = 1, merge=FALSE, col=c("black", "blue"), cex = 2,bty = "n")
box(lwd=5)

#Stroma is Unknown
smoothScatter((DT_St_m+OB_St_m)/2, DT_St_m-OB_St_m, ylab="Estimate - Truth", xlab="(Estimate + Truth)/2", xlim=c(2,16), ylim=c(-1.2,1.2), main="Mean expressions for Stroma", pch = 1, nrpoints = 0, col = 'yellow', colramp = colorRampPalette(c("white", "yellow","yellow1", "orange", "orange1" )))
tmp01=lowess((DT_St_m-OB_St_m)~((DT_St_m+OB_St_m)/2))
lines(tmp01$x,tmp01$y,col="blue", lwd = 5)
tmp01=lowess((ISO_St_m-OB_St_m)~((ISO_St_m+OB_St_m)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("bottomright", c(paste("ISOpure (ccc = ", corISO_St_m, ")", sep = ''), paste("DeMixT (ccc = ", corDT_St_m, ")", sep = '')),lty = 1, text.col=c("black", "blue"), col=c("black", "blue"), merge=FALSE, cex = 2,bty = "n")
box(lwd=5)

Supplementary Figure 7

#Tumor is unknown
smoothScatter((DT_Tu_m+ISO_Tu_m)/2, DT_Tu_m-ISO_Tu_m, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. ISOpure (Tumor is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow", "yellow1","orange", "orange1")))
tmp01=lowess((DT_Tu_m-ISO_Tu_m)~((DT_Tu_m+ISO_Tu_m)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", cor2Tu_m, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")

smoothScatter((DT_Tu_m+OB_Tu_m)/2, DT_Tu_m-OB_Tu_m, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. Truth (Tumor is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow", "yellow1","orange", "orange1")))
tmp01=lowess((DT_Tu_m-OB_Tu_m)~((DT_Tu_m+OB_Tu_m)/2))
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
abline(h=0,col='red', lty = 2)
legend("topright", c(paste("ccc = ", corDT_Tu_m, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")

smoothScatter((OB_Tu_m+ISO_Tu_m)/2, ISO_Tu_m-OB_Tu_m, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="ISOpure vs. Truth (Tumor is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow", "yellow1","orange", "orange1")))
tmp01=lowess((ISO_Tu_m-OB_Tu_m)~((ISO_Tu_m+OB_Tu_m)/2))
abline(h=0,col='red', lty = 2)
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
legend("topright", c(paste("ccc = ", corISO_Tu_m, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")

#Stroma is unknown
smoothScatter((DT_St_m+ISO_St_m)/2, DT_St_m-ISO_St_m, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. ISOpure (Stromal is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow", "yellow1","orange", "orange1")))
tmp01=lowess((DT_St_m-ISO_St_m)~((DT_St_m+ISO_St_m)/2))
abline(h=0,col='red', lty = 2)
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
legend("topright", c(paste("ccc = ", cor2St_m, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")

smoothScatter((DT_St_m+OB_St_m)/2, DT_St_m-OB_St_m, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="DeMixT vs. Truth (Stromal is unknown)", pch = 20, nrpoints = 0,,col = 'yellow', colramp = colorRampPalette(c("white", "yellow", "yellow1","orange", "orange1")))
tmp01=lowess((DT_St_m-OB_St_m)~((DT_St_m+OB_St_m)/2))
abline(h=0,col='red', lty = 2)
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
legend("topright", c(paste("ccc = ", corDT_St_m, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")

smoothScatter((OB_St_m+ISO_St_m)/2, ISO_St_m-OB_St_m, ylab="M", xlab="A", xlim=c(2,16), ylim=c(-6,6),
main="ISOpure vs. Truth (Stromal is unknown)", pch = 20, nrpoints = 0,col = 'yellow', colramp = colorRampPalette(c("white", "yellow", "yellow1","orange", "orange1")))
tmp01=lowess((ISO_St_m-OB_St_m)~((ISO_St_m+OB_St_m)/2))
abline(h=0,col='red', lty = 2)
lines(tmp01$x,tmp01$y,col="black", lwd = 5)
legend("topright", c(paste("ccc = ", corISO_St_m, sep = '')),text.col="black", col="black", merge=FALSE, cex = 2.5,bty = "n")

Figure 3d

lcm_name <- function(x) paste(strsplit(x, split = '_')[[1]][2], strsplit(x, split = '_')[[1]][3],
                              strsplit(x, split = '_')[[1]][4],sep = '-')
mid <- sapply(mid, lcm_name)
nid <- sapply(nid, lcm_name)
tid <- sapply(tid, lcm_name)
OB_St <- as.matrix(log2(read.table("lcm_normal.txt", header = F)))
OB_Tu <- as.matrix(log2(read.table("lcm_tumor.txt",  header = F)))
ISO_Tu <- as.matrix(log2(read.table("lcm_SA_cc_cancerprofiles.txt", header = F)))
ISO_St <- as.matrix(log2(read.table("lcm_TA_cc_cancerprofiles.txt", header = F)))
DT_Tu <- as.matrix(log2(read.table("lcm_SA_out_decov.txt", header = F)))
DT_St <- as.matrix(log2(read.table("lcm_TA_out_decov.txt", header = F)))
dt_purS <- as.numeric(read.table('lcm_SA_out.txt', header = F)[1,])
dt_purT <- as.numeric(read.table('lcm_TA_out.txt', header = F)[1,])
iso_purT <- as.numeric(read.table('lcm_SA_alphapurities.txt', header = F)[,1])
iso_purS <- as.numeric(read.table('lcm_TA_alphapurities.txt', header = F)[,1])
# input data
inputA = read.table('lcm_adm.txt', header =F)
inputA = log2(as.matrix(inputA))
#isost,isotu,dtst,dttu
ISO_Tu <- as.matrix(ISO_Tu)
ISO_St <- as.matrix(ISO_St)
DT_Tu <- as.matrix(DT_Tu)
DT_St <- as.matrix(DT_St)
ISO_Tu_St <- log2(t(t(2^inputA - sweep(2^ISO_Tu, MARGIN = 2, iso_purT, '*'))/(1 - iso_purT)))
ISO_St_Tu <- log2(t(t(2^inputA - sweep(2^ISO_St, MARGIN = 2, iso_purS, '*'))/(1 - iso_purS)))
DT_Tu_St <- log2(t(t(2^inputA - sweep(2^DT_Tu, MARGIN = 2, 1 - dt_purS, '*'))/dt_purS))
DT_St_Tu <- log2(t(t(2^inputA - sweep(2^DT_St, MARGIN = 2, 1 - dt_purT, '*'))/dt_purT))
joinid <- intersect(intersect(mid,tid),nid)
jmid <- mid[mid%in%joinid]
isost <- ISO_St[, mid%in%joinid]
isotu <- ISO_Tu[, mid%in%joinid]
dtst <- DT_St[, mid%in%joinid]
dttu <- DT_Tu[, mid%in%joinid]
###true
obst <- OB_St[, match(jmid,nid)]
obtu <- OB_Tu[, match(jmid,tid)]
##proportion##
dtpurS <- dt_purS[mid%in%joinid]
dtpurT <- dt_purT[mid%in%joinid]
isopurT <- iso_purT[mid%in%joinid]
isopurS <- iso_purS[mid%in%joinid]
obst <- as.matrix(obst)
obtu <- as.matrix(obtu)
isost <- ISO_St[, mid%in%joinid]
isotu <- ISO_Tu[, mid%in%joinid]
dtst <- DT_St[, mid%in%joinid]
dttu <- DT_Tu[, mid%in%joinid]
isosttu <- ISO_St_Tu[, mid%in%joinid]
isotust <- ISO_Tu_St[, mid%in%joinid]
dtsttu <- DT_St_Tu[, mid%in%joinid]
dttust <- DT_Tu_St[, mid%in%joinid]
isoCst <- c();isoCtu <- c()
dtCst <- c();dtCtu <- c()
for(i in 1:21){
  isoCst <- c(isoCst, round(epi.ccc(obst[,i], isost[,i], ci = "z-transform", conf.level = 0.95)$rho.c$est, 2))
  dtCst <- c(dtCst, round(epi.ccc(obst[,i], dtst[,i], ci = "z-transform", conf.level = 0.95)$rho.c$est, 2))
}
for(i in 1:21){
  isoCtu <- c(isoCtu, round(epi.ccc(obtu[,i], isotu[,i], ci = "z-transform", conf.level = 0.95)$rho.c$est, 2))
  dtCtu <- c(dtCtu, round(epi.ccc(obtu[,i], dttu[,i], ci = "z-transform", conf.level = 0.95)$rho.c$est, 2))
}
isoCsttu <- c();isoCtust <- c()
dtCsttu <- c();dtCtust <- c()
for(i in 1:21){
  isoCtust <- c(isoCtust, round(epi.ccc(obst[,i], isotust[,i], ci = "z-transform", conf.level = 0.95)$rho.c$est, 2))
  dtCtust <- c(dtCtust, round(epi.ccc(obst[,i], dttust[,i], ci = "z-transform", conf.level = 0.95)$rho.c$est, 2))
}
for(i in 1:21){
  isoCsttu <- c(isoCsttu, round(epi.ccc(obtu[,i], isosttu[,i], ci = "z-transform", conf.level = 0.95)$rho.c$est, 2))
  dtCsttu <- c(dtCsttu, round(epi.ccc(obtu[,i], dtsttu[,i], ci = "z-transform", conf.level = 0.95)$rho.c$est, 2))
}
purTT <- (dtpurT+1-dtpurS)/2
ggdat <- as.data.frame(cbind(cbind(dtCtu, dtCsttu), purTT))
colnames(ggdat)[3] = 'Tumor purity'
ggplot(ggdat, aes(x = dtCtu, y = dtCsttu)) +
geom_abline(intercept = 0, slope = 1, colour = 'red', size = 2.5) +
geom_jitter(aes(size = `Tumor purity`, colour = `Tumor purity`), position=position_jitter(0.015)) +
  scale_size_continuous(guide = FALSE, range = c(5, 10)) + 
  scale_color_gradientn(colours = rainbow(5)) +
  xlim(0.8,1) +
ylim(0.8,1) + 
xlab(bquote(paste('CCC(',hat(t)[i]^a, 'vs.',t[i]^'obs',')'))) +
ylab(bquote(paste('CCC(',hat(t)[i]^b, 'vs.',t[i]^'obs',')'))) +
ggtitle('Individual deconvolved expressions (DeMixT)') +
theme(axis.line = element_line(colour = "black"),
      aspect.ratio=1,
        legend.text=element_text(size=15),
        legend.key.size=unit(1,"cm"),
        legend.title = element_text(size=20),
        plot.title = element_text(hjust = 0.5, vjust=1.5, size = 18),
        axis.text=element_text(size=23),
        axis.title=element_text(size=23),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_rect(fill = 'NA', colour = "black", size=5),
        panel.background = element_blank())

Supplementary Figure 8

iso1 = apply(ISO_Tu, 1, sd)[condTu]
obs1 <- apply(OB_Tu, 1, sd)[condTu]
lcm11 <- apply(log2(read.table("lcm_TA_out_decov.txt", header = F)), 1, sd)[condTu]
plot(density(iso1), col = 'black', xlab = 'Sample standard deviation\n(log2-transformed expression)', main = 'Tumor', lwd = 3, ylim = c(0, 6), xlim = c(0, 1.6))
lines(density(obs1), col = 'red', lwd = 2)
lines(density(lcm11), col = 'blue', lwd = 2)
legend('topright', c("ISOpure", "Observed", "DeMixT"), cex = 2, lty = c(1,1,1), col = c( "black", "red","blue"))

Supplementary Figure 9

ggdat1 <- as.data.frame(cbind(cbind(dtCtu, isoCtu), 1 - dtpurS))
colnames(ggdat1)[3] = 'Tumor purity'
ggplot(ggdat1, aes(x = dtCtu, y = isoCtu)) +
  geom_abline(intercept = 0, slope = 1, colour = 'red', size = 2.5) +
  geom_jitter(aes(size = `Tumor purity`, colour = `Tumor purity`), position=position_jitter(0.015)) +
  scale_size_continuous(guide = FALSE, range = c(5, 10)) + 
  scale_color_gradientn(colours = rainbow(5)) +
  xlim(0.27,1) +
  ylim(0.27,1) + 
  ggtitle(bquote(paste('CCC(',hat(t)[i]^a, 'vs.',t[i]^'obs',') ','(deconvolved tumor expressions)'))) +
  xlab("DeMixT") +
  ylab("ISOpure") +
  theme(axis.line = element_line(colour = "black"),
        aspect.ratio=1,
        legend.text=element_text(size=15),
        legend.key.size=unit(1,"cm"),
        legend.title = element_text(size=20),
        plot.title = element_text(hjust = 0.5, vjust=1.5, size = 18),
        axis.text=element_text(size=23),
        axis.title=element_text(size=23),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_rect(fill = 'NA', colour = "black", size=5),
        panel.background = element_blank())
ggdat2 <- as.data.frame(cbind(cbind(dtCsttu, isoCsttu), dtpurT))
colnames(ggdat2)[3] = 'Tumor purity'
ggplot(ggdat2, aes(x = dtCsttu, y = isoCsttu)) +
  geom_abline(intercept = 0, slope = 1, colour = 'red', size = 2.5) +
  geom_jitter(aes(size = `Tumor purity`, colour = `Tumor purity`), position=position_jitter(0.015)) +
  scale_size_continuous(guide = FALSE, range = c(5, 10)) + 
  scale_color_gradientn(colours = rainbow(5)) +
  xlim(0.27,1) +
  ylim(0.27,1) + 
  ggtitle(bquote(paste('CCC(',hat(t)[i]^b, 'vs.',t[i]^'obs',') ','(deconvolved tumor expressions)'))) +
  xlab("DeMixT") +
  ylab("ISOpure") +
  theme(axis.line = element_line(colour = "black"),
        aspect.ratio=1,
        legend.text=element_text(size=15),
        legend.key.size=unit(1,"cm"),
        legend.title = element_text(size=20),
        plot.title = element_text(hjust = 0.5, vjust=1.5, size = 18),
        axis.text=element_text(size=23),
        axis.title=element_text(size=23),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_rect(fill = 'NA', colour = "black", size=5),
        panel.background = element_blank())

TCGA HNSCC data

We downloaded RNA-seq data for HNSCC from TCGA data portal (https://portal.gdc.cancer.gov/). There was a total of 44 normal and 269 tumors samples for HNSCC. We collected the information of HPV infection for the HNSCC samples. Samples were classified as HPV+ using an empiric definition of detection of \(>\) 1000 mapped RNA-seq reads, primarily aligning to viral genes E6 and E7, which resulted in 36 HPV+ samples . We then devised a workflow to estimate the immune cell proportions (Supplementary Figure 10). Our workflow included three steps. The downloaded normal samples provided reference profiles for the stromal component in each step. We first downloaded stromal and immune scores from single-sample gene set enrichment analysis for all of our tumor samples and selected \(9\) tumor samples with low immune scores (\(< -2\)) and high stromal scores (\(>0\)), which suggested that these samples were likely low in immune infiltration. We then ran DeMixT under the two-component mode on these samples, generating the deconvolved expression profiles for the tumor and stromal components. We used these profiles as reference samples for running DeMixT under the three-component mode in the \(36\) \(HPV^+\) samples, generating deconvolved expression profiles for the immune component. In these two steps, we used deconvolved profiles that have smaller estimated standard variations as the reference profiles for the next step. We then ran DeMixT under the three-component mode on all \(269\) samples with reference profiles from normal samples and the deconvolved immune component. We calculated p-values (Benjamini-Hochberg corrected ) for the differential test of deconvolved expressions for the immune component versus the stromal component, and for the immune component versus the tumor component, respectively, on a set of 63 immune marker genes.
We performed gene selection in the GSCM approach (as described above), with a slightly larger threshold to account for the large sample size: sample standard deviation \(<0.5\) and the top \(500\) genes for three-component deconvolution to estimate the \(\pi\)’s.

Load data

options(scipen=999)
scale.norm <- function(x){
  nS <- ncol(x)
  nG <- nrow(x)
  nT <- apply(x, 2, sum)
  mT <- mean(nT)
  y <- matrix(0, nrow = nG, ncol = nS)
  for (i in 1:nS){
    y[,i] <- x[, i]/nT[i]*mT
  }
  return(y)
}
load("HNSC.RData") ##HPV information
load("tcga_expr.RData") ##Tumor expression profiles
load("tcgav2_N.RData")  ## Normal expression profiles
cname = colnames(tcga); mix = cbind(normal, tcga)
mix = scale.norm(mix)
normal = mix[,1:44]; tcga = mix[,45:317]
colnames(tcga) = cname
save(normal, file = "normal.RData")
save(tcga, file = "tcga.RData")

Phase I in our pipeline

library(DeMixT)
load("HNSC.RData")
load("tcga.RData")
load("normal.RData")
load("ImmuneScore.RData")
load('HNSC_normalized.RData')
load("outlier.RData")
filter1 = 250
filter2 = 500
filter3 = 500
Max.inter<-20
stage1.num=sum((Immune.score<(-2))&(Stromal.score > 0))
tumor_id = HNSC[(Immune.score<(-2))&(Stromal.score > 0), ]$ID
tumor <- tcga[, colnames(tcga) %in% tumor_id]
inputmat1 <- cbind(normal, tumor)
inputmat1 <- as.matrix(inputmat1)
cnvgroup<-c(rep(1, 44), rep(3, 9)); # input as 2-component
testr<-DeMixT.S1(inputmat1, cnvgroup, niter = Max.inter, ninteg = 50, filter.option = 1, 
          filter.criteria1 = c(0.5, 0.5), filter.criteria2 = c(filter1, filter1), filter.criteria3 = 0.25, if.filter = FALSE, tol = 1e-5)
save(testr, file=paste0("New_pipeline_Stage1_pi_",filter1,"_",filter2,"_",filter3,".RData"))
print("stage 1 over")
givenpi <- c(as.numeric(colSums(testr$pi)), rep(0, 9))
# define group 1 for normal; 2 for nomral 2; 3 for mixed
cnvgroup<-c(rep(1, 44), rep(3, 9)); # two component
# Run this function
testr <- DeMixT.S2(inputmat1, cnvgroup, givenpi, ninteg = 60)
save(testr, file=paste0("New_pipeline_Stage1_res_",filter1,"_",filter2,"_",filter3,".RData"))

Phase II in our pipeline

inputmat1<-cbind(normal, testr$decovExprT)
hpv.id<-which(HNSC$hnsc.hpv.status=="Positive")
inputmat1<-cbind(inputmat1, tcga[,hpv.id])
len.mix<-length(hpv.id)
cnvgroup<-c(rep(1, 44), rep(2, 9), rep(3, len.mix)); # three componentprint("stage 1 start")
# Run this function
testr<-DeMixT.S1(inputmat1, cnvgroup, niter = Max.inter, ninteg = 50, filter.option = 1, 
                 filter.criteria1 = c(0.6, 0.6), filter.criteria2 = c(filter2, filter2), filter.criteria3 = 0.25, if.filter = FALSE, tol = 1e-5)
save(testr, file=paste0("New_pipeline_Stage2_pi_",filter1,"_",filter2,"_",filter3,".RData"))
print("stage 1 over")
givenpi <- c(testr$pi[1,], testr$pi[2,])
testr <- DeMixT.S2(inputmat1, cnvgroup, givenpi, ninteg = 60)
save(testr, file=paste0("New_pipeline_Stage2_res_",filter1,"_",filter2,"_",filter3,".RData"))

Phase III in our pipeline

##a preprocessing step to remove genes with too large estimated variation, as considered inaccurate
id4<-which(testr$decovSigma<0.99)
save(id4, file="id4.RData")
decov = as.matrix(testr$decovExprT)
inputmat1<-cbind(normal, decov, tcga)
inputmat1<-inputmat1[id4,]
valid.id4<-which(apply(inputmat1, 1, FUN = function(x) sum(x<=0.1)==0))
inputmat1<-inputmat1[valid.id4,]
inputmat1<-log2(inputmat1)
#################################first stage####################################
inputn1m <- apply(inputmat1[,1:44], 1, mean)
inputn2m <- apply(inputmat1[,45:80], 1, mean)
inputn1s <- apply(inputmat1[,1:44], 1, sd)
inputn2s <- apply(inputmat1[,45:80], 1, sd)
id <- which((inputn1s < 0.9)&(inputn2s < 0.9))
length(id)
order.id<-order((abs(inputn1m[id] - inputn2m[id])))
inputmat2 <- inputmat1[id[order.id[1:(filter2+250)]],]
inputmixs <- apply(2^inputmat2[,81:353], 1, sd)
id1 <- order(inputmixs, decreasing = T)
id1 <- id1[1:filter2]
inputmat3 <- inputmat2[id1,] 
cnvgroup<-c(rep(1, 80), rep(3, 273)); # input as 2-component
print("stage 1 start")
# Run deconvolution function
inputmat3 <- 2^inputmat3
testr <- DeMixT.Kernel(inputdata=inputmat3, groupid=cnvgroup, nhavepi=0, givenpi=rep(0,273), givenpiT=rep(0,273), niter=Max.inter, ninteg=50, tol=1e-5)
print("stage 1 over")
pi1 <- testr$pi
pi1 <- as.numeric(pi1[1,])
pit <- 1 - pi1
save(pit, testr, file=paste0("New_pipeline_Stage3_pi_",filter1,"_",filter2,"_",filter3,".RData"))
id <- testr$id
inputn1m <- apply(inputmat1[,1:44], 1, mean)
id <- ((inputn1s < 0.6)&(inputn2s < 0.6))
sum(id)
inputmat2 <- inputmat1[id,]
input2n1m <- apply(inputmat2[,1:44], 1, mean)
input2n2m <- apply(inputmat2[,45:80], 1, mean)
input2nd <- abs(input2n1m/input2n2m)
id1 <- order(input2nd, decreasing = TRUE)
id1 <- id1[1:(filter3+250)]
inputmat3 <- inputmat2[id1,]
len.mix<-273
inputmixs <- apply(2^inputmat3[,81:(81+len.mix-1)], 1, sd)
id2 <- order(inputmixs, decreasing = TRUE)
id2 <- id2[1:filter3]
inputmat4 <- inputmat3[id2,]
inputmat4 <- 2^inputmat4
cnvgroup<-c(rep(1, 44), rep(2,36), rep(3, len.mix)); # input as 3-component
testr <- DeMixT.Kernel(inputdata=inputmat4, groupid=cnvgroup, nhavepi=2, givenpi=rep(0,len.mix), givenpiT=pit, niter=Max.inter, ninteg=50, tol=1e-5)
save(testr, file=paste0("New_pipeline_Stage3_res_pi_",filter1,"_",filter2,"_",filter3,".RData"))
inputmat1 <- 2^inputmat1
givenpi <- c(testr$pi[1,], testr$pi[2,])
testr <- DeMixT.S2(inputmat1, cnvgroup, givenpi, ninteg = 60)
save(testr, file=paste0("New_pipeline_Final_decovolved_res_",filter1,"_",filter2,"_",filter3,".RData"))
##### Immune signature genes deconvolution #####
Immune.gene.all<-read.table(file="Immune_Gene_list.txt", header = TRUE)
Immune.gene.all<-unique(Immune.gene.all)
immune.gene.id<-match(Immune.gene.all[,1], Gene.name)
immune.gene.name<-as.character(Immune.gene.all[,1])
immune.gene.name<-immune.gene.name[!is.na(immune.gene.id)]
immune.gene.id<-immune.gene.id[!is.na(immune.gene.id)]
valid.sample<-c(1:273)
valid.sample<-valid.sample[-outlier]
groupid<-c(rep(1,44), rep(2,36), rep(3,269))
gene.name<-immune.gene.name[id1]
load(paste0("New_pipeline_Stage3_res_pi_",filter1,"_",filter2,"_",filter3,".RData"))
bar_tum <- 1-colSums(testr$pi)
bar_immu <- testr$pi[2,]
bar_strom <- testr$pi[1,]
bar_tum<-bar_tum[valid.sample]
bar_immu<-bar_immu[valid.sample]
bar_strom<-bar_strom[valid.sample]
givenpi<-c(bar_strom, bar_immu)
load(paste0("New_pipeline_Stage2_res_",filter1,"_",filter2,"_",filter3,".RData"))
decov = as.matrix(testr$decovExprT)
nData<-cbind(normal, decov, tumor[, valid.sample])
nData<-nData[immune.gene.id,]
res.immune<-DeMixT.S2(nData, groupid, givenpi, ninteg = 60)
save(res.immune, gene.name, bar_tum, bar_immu, bar_strom, file=paste0("New.Pipeline.Immune.Gene.new.Res_",filter1,"_",filter2,"_",filter3,".RData"))

Results summary

Figure 4a

load('HNSC_normalized.RData')
load('HNSC.RData')
library('beeswarm')
library("hydroGOF")
library("ggplot2")
library("grid")
library("gtable")
load("New.Pipeline.Immune.Gene.SmallSigma.new.Res_250_500_500.RData")
load("outlier.RData")
valid.sample<-c(1:273)
valid.sample<-valid.sample[-outlier]
filter1<-250
filter2<-500
filter3<-500
labels.ind<-c(rep("Immune",269), rep("Stromal", 269), rep("Tumor", 269))
HNSC<-HNSC[valid.sample,]
hpv.id<-which(HNSC$hnsc.hpv.status=="Positive")
hpv <- HNSC$hnsc.hpv.status
DecovTumor<-res.immune$decovExpr
DecovImmune<-res.immune$decovExprN2
DecovStromal<-res.immune$decovExprN1
tern2cart <- function(coord){
  coord[1]->x
  coord[2]->y
  coord[3]->z
  x+y+z->tot
  x/tot -> x
  y/tot -> y
  z/tot -> z
  (2*y + z)/(2*(x+y+z)) -> x1
  sqrt(3)*z/(2*(x+y+z)) -> y1
  return(c(x1,y1))
}
a<-seq(0.96,0.04, length.out = 9)
b<-rep(0,9)
c<-seq(0.04,0.96,length.out = 9)
grid<-data.frame(x=c(a, b, c, a, c, b),y=c(b, c, a, c, b, a),z=c(c, a, b, b, a, c))
t(apply(grid,1,tern2cart)) -> grid.tern
grid.tern[,2]<-grid.tern[,2]-0.25
grid.tern[,1]<-grid.tern[,1]-0.5
triplot(x = bar_immu, y = bar_tum, z = bar_strom,
        col = ifelse(hpv == "Positive", "red", "black"), lab=NULL,
        pch = ifelse(hpv == "Positive", "+", "-"),
        grid = seq(0, 1, by = 0.1))
legend("topleft", c("HPV Postive", "HPV Negative"), col = c("red", "black"), pch=c("+","-"), cex = 0.9, bty="n")
paste(seq(10,90,by=10),"%")->lab
text(grid.tern[9:1,]+matrix(c(rep(-0.04,9),rep(0,9)), nrow = 9, ncol = 2),paste0(lab),col="black",cex=0.8, pos=2)
text(grid.tern[5,]+matrix(c(rep(-0.15,1),rep(0,1)), nrow = 1, ncol = 2),paste0("Immune"),col="black",cex=1.2, pos=2)
text(grid.tern[18:10,]+matrix(c(rep(0.18,9),rep(0,9)), nrow = 9, ncol = 2),paste0(lab),col="black",cex=0.8, pos=2)
text(grid.tern[14,]+matrix(c(rep(0.38,1),rep(0,1)), nrow = 1, ncol = 2),paste0("Tumor"),col="black",cex=1.2, pos=2)
text(grid.tern[27:19,]+matrix(c(rep(0.07,9),rep(-0.12,9)), nrow = 9, ncol = 2),paste0(lab),col="black",cex=0.8, pos=2)
text(grid.tern[23,]+matrix(c(rep(0.16,1),rep(-0.2,1)), nrow = 1, ncol = 2),paste0("Stromal"),col="black",cex=1.2, pos=2)

Figure 4b

pos_bar <- bar_immu[hpv == "Positive"]
neg_bar <- bar_immu[hpv == "Negative"]
pval = sprintf("%0.3g", t.test(pos_bar, neg_bar,alternative = "greater")$p.value)
KS.pvalue <-ks.test(pos_bar, neg_bar)$p.value
boxplot(bar_immu ~ hpv, ylab = 'Estimated proportion of immune component', xaxt = 'n', outline = T,
        col = c('white', 'red'),
        main = paste0("HNSCC", "(n = ", length(bar_immu),", IlluminaHiSeq RNAseqV2)"))
beeswarm(bar_immu ~ hpv, col = 4, pch = 16, add = T)
axis(1, at = 1:2, labels = c(expression(paste('HPV'^'-',' (n=233)',sep = '\n')),expression(paste('HPV'^'+', ' (n=36)', sep = '\n'))), line = 2, tick = F)
legend("topleft", paste('P = ',pval, sep = ''), bty = 'n', cex = 2, text.font = 3)
box(lwd=5)

Figure 4c

pval.Stromal<-rep(1,length(gene.name))
pval.Tumor<-rep(1,length(gene.name))
for(ii in 1:length(gene.name)){
  GeneName<-gene.name[ii]
  bar<-c(DecovImmune[ii,], DecovStromal[ii,], DecovTumor[ii,])
  pval.Stromal[ii] = t.test(log2(DecovImmune[ii,]), log2(DecovStromal[ii,]), alternative = "greater")$p.value
  pval.Tumor[ii] = t.test(log2(DecovImmune[ii,]), log2(DecovTumor[ii,]), alternative = "greater")$p.value
  
}
select.names<-c("CD4", "CD14", "HLA-DOB")
match.select<-match(select.names, gene.name)
for(ii in match.select){
  Gene.name<-paste0(gene.name[ii])
  bar<-c(DecovImmune[ii,], DecovStromal[ii,], DecovTumor[ii,])
  boxplot(log2(bar) ~ labels.ind, ylab="log2(Deconvoluted Expression)", range=3, outline=TRUE, main=Gene.name, cex=1, cex.lab=1, cex.axis=1, cex.main=2, 
          ylim=c(2,17), notch=TRUE, col=c("red","green","blue"), outcol=c("red","green","blue"), outpch=1, outcex=1 )
  legend("topright", paste0('P = ',sprintf("%0.3g", pval.Stromal[ii]),', ',sprintf("%0.3g", pval.Tumor[ii])), bty = 'n', cex = 2, text.font = 2)
  box(lwd=5)
}

Figure 4d

ids<-c(1:length(gene.name))
pval.Stromal<-p.adjust(pval.Stromal, method="BH", n=13400)
pval.Tumor<-p.adjust(pval.Tumor, method="BH", n=13400)
id.Stromal<-which(pval.Stromal<0.05)
id.Tumor<-which(pval.Tumor<0.05)
id.ST<-intersect(id.Stromal, id.Tumor)
id.union<-union(id.Stromal, id.Tumor)
id.nonsignficant<-setdiff(c(1:length(gene.name)), id.union)
plot(-log10(pval.Stromal), -log10(pval.Tumor), xlab="-log10(P.value), Immune VS Stromal", ylab="-log10(P.value), Immune VS Tumor")
points(-log10(pval.Stromal[id.ST]), -log10(pval.Tumor[id.ST]), col="red")
abline(h=-log10(0.05), col="green")
abline(v=-log10(0.05), col="green")
data.out<-data.frame(gene.name, sprintf("%0.3g",pval.Stromal), sprintf("%0.3g",pval.Tumor))
colnames(data.out)<-c("Gene name", "Adjusted P value(Immune VS Stromal)", "Adjusted P value(Immune VS Tumor)")
write.table(data.out, file="Gene_table_summary.txt", quote=FALSE, row.name=FALSE, sep = "\t")

Supplementary Figure 11

plot(density(bar_immu[hpv == "Positive"]), col = "red", main = paste0("HNSCC (n = 269, IlluminaHiSeq RNAseqV2)"),
     xlab = "Estimated proportion of immune component", ylim = c(0, 5.0), xlim = c(0,1.0))
lines(density(bar_immu[hpv == "Negative"]), col = "blue")
legend("topright", c(expression(paste('Red: ','HPV'^'+', sep ='')), expression(paste('Blue: ','HPV'^'-', sep =''))),
       lty=c(1,1), merge=FALSE, col=c("red","blue"), cex = 1)
legend("topleft", paste0("K-S test p.value = ",sprintf("%0.3g", KS.pvalue)), cex = 1)