BootstrapClusterTest {ClassDiscovery} | R Documentation |
Performs a nonparametric bootstrap (sampling with replacement) test to determine whether the clusters found by an unsupervised method appear to be robust in a given data set.
BootstrapClusterTest(data, FUN, subsetSize, nTimes = 100, verbose = TRUE, ...)
data |
A data matrix, numerical data frame, or
ExpressionSet object. |
FUN |
A function that, given a data matrix,
returns a vector of cluster assignments. Examples of functions
with this behavior are cutHclust ,
cutKmeans , cutPam , and
cutRepeatedKmeans . |
... |
Additional arguments passed to the classifying function, FUN . |
subsetSize |
An optional integer argument. If present,
each iteration of the bootstrap selects subsetSize rows
from the original data matrix. If missing, each bootstrap contains
the same number of rows as the original data matrix. |
nTimes |
The number of bootstrap samples to collect. |
verbose |
A logical flag |
Objects should be created using the BootstrapClusterTest
function, which performs the requested bootstrap on the
clusters. Following the standard R paradigm, the resulting object can be
summarized and plotted to determine the results of the test.
f
:function
that, given a data matrix,
returns a vector of cluster assignments. Examples of functions
with this behavior are cutHclust
,
cutKmeans
, cutPam
, and
cutRepeatedKmeans
.
subsetSize
:nTimes
:call
:call
, which records
how the object was produced. result
:matrix
containing, for
each pair of columns in the original data, the number of times
they belonged to the same cluster of a bootstrap sample.
Class ClusterTest
, directly. See that class for
descriptions of the inherited methods image
and hist
.
signature(object = BootstrapClusterTest)
:
Write out a summary of the object. Kevin R. Coombes <kcoombes@mdanderson.org>
Kerr MK, Churchill GJ. Boostrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 2001; 98:8961-8965.
ClusterTest
,
PerturbationClusterTest
# simulate data from two different groups d1 <- matrix(rnorm(100*30, rnorm(100, 0.5)), nrow=100, ncol=30, byrow=FALSE) d2 <- matrix(rnorm(100*20, rnorm(100, 0.5)), nrow=100, ncol=20, byrow=FALSE) dd <- cbind(d1, d2) cols <- rep(c('red', 'green'), times=c(30,20)) # peform your basic hierarchical clustering... hc <- hclust(distanceMatrix(dd, 'pearson'), method='complete') # bootstrap the clusters arising from hclust bc <- BootstrapClusterTest(dd, cutHclust, nTimes=200, k=3, metric='pearson') summary(bc) # look at the distribution of agreement scores hist(bc, breaks=101) # let heatmap compute a new dendrogram from the agreement image(bc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols) # plot the agreement matrix with the original dendrogram image(bc, dendrogram=hc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols) # bootstrap the results of PAM pamc <- BootstrapClusterTest(dd, cutPam, nTimes=200, k=3) image(pamc, dendrogram=hc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols) # contrast the behavior when all the data comes from the same group xx <- matrix(rnorm(100*50, rnorm(100, 0.5)), nrow=100, ncol=50, byrow=FALSE) hct <- hclust(distanceMatrix(xx, 'pearson'), method='complete') bct <- BootstrapClusterTest(xx, cutHclust, nTimes=200, k=4, metric='pearson') summary(bct) image(bct, dendrogram=hct, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols) # cleanup rm(d1, d2, dd, cols, hc, bc, pamc, xx, hct, bct)