| BootstrapClusterTest {ClassDiscovery} | R Documentation | 
Performs a nonparametric bootstrap (sampling with replacement) test to determine whether the clusters found by an unsupervised method appear to be robust in a given data set.
BootstrapClusterTest(data, FUN, subsetSize, nTimes = 100, verbose = TRUE, ...)
| data | A data matrix, numerical data frame, or ExpressionSetobject. | 
| FUN | A functionthat, given a data matrix,
returns a vector of cluster assignments.  Examples of functions
with this behavior arecutHclust,cutKmeans,cutPam, andcutRepeatedKmeans. | 
| ... | Additional arguments passed to the classifying function, FUN. | 
| subsetSize | An optional integer argument. If present,
each iteration of the bootstrap selects subsetSizerows
from the original data matrix. If missing, each bootstrap contains
the same number of rows as the original data matrix. | 
| nTimes | The number of bootstrap samples to collect. | 
| verbose | A logical flag | 
Objects should be created using the  BootstrapClusterTest
function, which performs the requested bootstrap on the
clusters. Following the standard R paradigm, the resulting object can be
summarized and plotted to determine the results of the test.
f:function that, given a data matrix,
returns a vector of cluster assignments.  Examples of functions
with this behavior are cutHclust,
cutKmeans, cutPam, and
cutRepeatedKmeans. 
subsetSize:nTimes:call:call, which records
how the object was produced. result:matrix containing, for
each pair of columns in the original data, the number of times
they belonged to the same cluster of a bootstrap sample. 
Class ClusterTest, directly. See that class for
descriptions of the inherited methods image and hist.
signature(object = BootstrapClusterTest):
Write out a summary of the object. Kevin R. Coombes <kcoombes@mdanderson.org>
Kerr MK, Churchill GJ. Boostrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 2001; 98:8961-8965.
ClusterTest,
PerturbationClusterTest
# simulate data from two different groups
d1 <- matrix(rnorm(100*30, rnorm(100, 0.5)), nrow=100, ncol=30, byrow=FALSE)
d2 <- matrix(rnorm(100*20, rnorm(100, 0.5)), nrow=100, ncol=20, byrow=FALSE)
dd <- cbind(d1, d2)
cols <- rep(c('red', 'green'), times=c(30,20))
# peform your basic hierarchical clustering...
hc <- hclust(distanceMatrix(dd, 'pearson'), method='complete')
# bootstrap the clusters arising from hclust
bc <- BootstrapClusterTest(dd, cutHclust, nTimes=200, k=3, metric='pearson')
summary(bc)
# look at the distribution of agreement scores
hist(bc, breaks=101)
# let heatmap compute a new dendrogram from the agreement
image(bc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)
# plot the agreement matrix with the original dendrogram
image(bc, dendrogram=hc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)
# bootstrap the results of PAM
pamc <- BootstrapClusterTest(dd, cutPam, nTimes=200, k=3)
image(pamc, dendrogram=hc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)
# contrast the behavior when all the data comes from the same group
xx <- matrix(rnorm(100*50, rnorm(100, 0.5)), nrow=100, ncol=50, byrow=FALSE)
hct <- hclust(distanceMatrix(xx, 'pearson'), method='complete')
bct <- BootstrapClusterTest(xx, cutHclust, nTimes=200, k=4, metric='pearson')
summary(bct)
image(bct, dendrogram=hct, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)
# cleanup
rm(d1, d2, dd, cols, hc, bc, pamc, xx, hct, bct)