R: The BootstrapClusterTest Class

BootstrapClusterTest {ClassDiscovery}

R Documentation

The BootstrapClusterTest Class

Description

Performs a nonparametric bootstrap (sampling with replacement) test to determine whether the clusters found by an unsupervised method appear to be robust in a given data set.

Usage

BootstrapClusterTest(data, FUN, subsetSize, nTimes = 100, verbose = TRUE, ...)

Arguments

`data`	A data matrix, numerical data frame, or `ExpressionSet` object.
`FUN`	A `function` that, given a data matrix, returns a vector of cluster assignments. Examples of functions with this behavior are `cutHclust`, `cutKmeans`, `cutPam`, and `cutRepeatedKmeans`.
`...`	Additional arguments passed to the classifying function, `FUN`.
`subsetSize`	An optional integer argument. If present, each iteration of the bootstrap selects `subsetSize` rows from the original data matrix. If missing, each bootstrap contains the same number of rows as the original data matrix.
`nTimes`	The number of bootstrap samples to collect.
`verbose`	A logical flag

Objects from the Class

Objects should be created using the BootstrapClusterTest function, which performs the requested bootstrap on the clusters. Following the standard R paradigm, the resulting object can be summarized and plotted to determine the results of the test.

Slots

f:: A function that, given a data matrix, returns a vector of cluster assignments. Examples of functions with this behavior are cutHclust, cutKmeans, cutPam, and cutRepeatedKmeans.
subsetSize:: The number of rows to be included in each bootstrap sample.
nTimes:: An integer, the number of bootstrap samples that were collected.
call:: An object of class call, which records how the object was produced.
result:: Object of class matrix containing, for each pair of columns in the original data, the number of times they belonged to the same cluster of a bootstrap sample.

Extends

Class ClusterTest, directly. See that class for descriptions of the inherited methods image and hist.

Methods

summary: signature(object = BootstrapClusterTest): Write out a summary of the object.

Author(s)

Kevin R. Coombes <kcoombes@mdanderson.org>

References

Kerr MK, Churchill GJ. Boostrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 2001; 98:8961-8965.

Examples

# simulate data from two different groups
d1 <- matrix(rnorm(100*30, rnorm(100, 0.5)), nrow=100, ncol=30, byrow=FALSE)
d2 <- matrix(rnorm(100*20, rnorm(100, 0.5)), nrow=100, ncol=20, byrow=FALSE)
dd <- cbind(d1, d2)
cols <- rep(c('red', 'green'), times=c(30,20))
# peform your basic hierarchical clustering...
hc <- hclust(distanceMatrix(dd, 'pearson'), method='complete')

# bootstrap the clusters arising from hclust
bc <- BootstrapClusterTest(dd, cutHclust, nTimes=200, k=3, metric='pearson')
summary(bc)

# look at the distribution of agreement scores
hist(bc, breaks=101)

# let heatmap compute a new dendrogram from the agreement
image(bc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)

# plot the agreement matrix with the original dendrogram
image(bc, dendrogram=hc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)

# bootstrap the results of PAM
pamc <- BootstrapClusterTest(dd, cutPam, nTimes=200, k=3)
image(pamc, dendrogram=hc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)

# contrast the behavior when all the data comes from the same group
xx <- matrix(rnorm(100*50, rnorm(100, 0.5)), nrow=100, ncol=50, byrow=FALSE)
hct <- hclust(distanceMatrix(xx, 'pearson'), method='complete')
bct <- BootstrapClusterTest(xx, cutHclust, nTimes=200, k=4, metric='pearson')
summary(bct)
image(bct, dendrogram=hct, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)

# cleanup
rm(d1, d2, dd, cols, hc, bc, pamc, xx, hct, bct)

[Package ClassDiscovery version 1.3 Index]