Department of Bioinformatics and Computational Biology

Home > Public Software > NG-CHM R > Clustering and Column/Row Ordering

Clustering and Column/Row Ordering

This vignette describes options for ordering rows and columns of a NG-CHM, and the additional options available when the desired ordering is hierarchical clustering.

To start, the needed packages are loaded into the R environment.

library(NGCHMDemoData)
library(NGCHMSupportFiles)
library(NGCHM)

In the Create a NG-CHM page, a NG-CHM was created with the default settings for ordering for rows and columns: hierarchical clustering, with correlation as the distance measure and Ward’s algorithm for the clustering method.

However the chmNew() function allows for arguments “rowOrder” and “colOrder”, each of which can be a vector, dendogram, or function specifying the order of the rows/columns. For example, the following command produces a NG-CHM with rows sorted alphabetically, with the default hierarchical clustering for the columns:

hm <- chmNew('tcga-gbm', TCGA.GBM.ExpressionData, rowOrder=sort(rownames(TCGA.GBM.ExpressionData)))

Similarly the following command produced a NG-CHM with randomly ordered columns, with the default hierarchical clustering for the rows:

hm <- chmNew('tcga-gbm', TCGA.GBM.ExpressionData, colOrder=colnames(TCGA.GBM.ExpressionData)[ sample.int(length(colnames(TCGA.GBM.ExpressionData))) ])

Hierarchical Clustering Options

For hierarchical clustering, additional options are available to specify the distance measure and clustering method.

The distance measure is specified with the ‘rowDist’ and ‘colDist’ arguments, and available options include those of the ‘dist’ function in the R Stats Package : ‘euclidean’, ‘maximum’, ‘manhattan’, ‘canberra’, ‘binary’, and ‘minkowski’. There are two additional distance metric options: ‘cosine’ and ‘correlation’. The ‘correlation’ option, which computes the distance measure as 1 minus the Pearson correlation among the rows/columns, is the default.

The clustering method is specified with the ‘rowAgglom’ and ‘colAgglom’ arguments, and available options are those of the ‘hclust’ function in the R Stats Package : ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, and ‘centroid’. The default is ‘ward.D2’.

As an example, the command below specifies a NG-CHM with hierarchical clustering using the Euclidean distance metric and complete linkage clustering algorithm for both rows and columns.

hm <- chmNew('tcga-gbm', TCGA.GBM.ExpressionData, rowDist='euclidean', colDist='euclidean', rowAgglom='complete', colAgglom='complete')

Explicit Clustering

In the examples above, the NG-CHM package performs the clustering using the hclust function from the R Stats Package . However users may perform the clustering explicitly, and use the results in constructing a NG-CHM. For example, the following creates an object of class hclust from the distance matrix of the demo expression data, for both the rows and columns, and uses those clustering results to construct an NG-CHM.

rowClust <- hclust(dist(TCGA.GBM.ExpressionData))
colClust <- hclust(dist(t(TCGA.GBM.ExpressionData)))
hm <- chmNew('tcga-gbm', TCGA.GBM.ExpressionData, rowOrder=rowClust, colOrder=colClust)

Similarly, an hclust object can be transformed into a dendrogram and used for the row and/or column ordering. The example below uses the clutering results from above as dedrograms:

rowDendrogram <- as.dendrogram(rowClust)
colDendrogram <- as.dendrogram(colClust)
hm <- chmNew('tcga-gbm', TCGA.GBM.ExpressionData, rowOrder=rowDendrogram, colOrder=colDendrogram)

Finally, the results can be exported to a .ngchm file (for use with the NGCHM Viewer , or a stand-alone .html file:

chmExportToFile(hm,'tcga-gbm.ngchm',overwrite=TRUE)
chmExportToHTML(hm,'tcga-gbm.html',overwrite=TRUE)