justClusters {ClassDiscovery} | R Documentation |

Unsupervised clustering algorithms, such as partitioning around medoids
(`pam`

), K-means (`kmeans`

), or
hierarchical clustering (`hclust`

) after cutting the tree,
produce a list of class assignments along with other structure. To
simplify the interface for the `BootstrapClusterTest`

and
`PerturbationClusterTest`

, we have written these routines
that simply extract these cluster assignments.

cutHclust(data, k, method = "average", metric = "pearson") cutPam(data, k) cutKmeans(data, k) cutRepeatedKmeans(data, k, nTimes) repeatedKmeans(data, k, nTimes)

`data` |
A numerical data matrix |

`k` |
The number of classes desired from the algorithm |

`method` |
Any valid linkage method that can be passed to the
`hclust` function |

`metric` |
Any valid distance metric that can be passed to the
`distanceMatrix` function |

`nTimes` |
An integer; the number of times to repeat the K-means algorithm with a different random starting point |

Each of the clustering routines used here has a different
structure for storing cluster assignments. The `kmeans`

function stores the assignments in a ‘cluster’ attribute. The
`pam`

function uses a ‘clustering’ attribute. For
`hclust`

, the assigments are produced by a call to the
`cutree`

function.

It has been observed that the K-means algorithm can converge to
different solutions depending on the starting values of the group
centers. We also include a routine (`repeatedKmeans`

) that runs
the K-means algorithm repeatedly, using different randomly generated
staring points each time, saving the best results.

Each of the `cut...`

functions returns a vector of integer values
representing the cluster assignments found by the algorithm.

The `repeatedKmeans`

function returns a list `x`

with three
components. The component `x$kmeans`

is the result of the call
to the `kmeans`

function that produced the best fit to the
data. The component `x$centers`

is a matrix containing the list
of group centers that were used in the best call to `kmeans`

.
The component `x$withinss`

contains the sum of the within-group
sums of squares, which is used as the measure of fitness.

Kevin R. Coombes <kcoombes@mdanderson.org>

# simulate data from three different groups d1 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE) d2 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE) d3 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE) dd <- cbind(d1, d2, d3) cutKmeans(dd, k=3) cutKmeans(dd, k=4) cutHclust(dd, k=3) cutHclust(dd, k=4) cutPam(dd, k=3) cutPam(dd, k=4) cutRepeatedKmeans(dd, k=3, nTimes=10) cutRepeatedKmeans(dd, k=4, nTimes=10) # cleanup rm(d1, d2, d3, dd)

[Package *ClassDiscovery* version 2.5.0 Index]