Object Oriented Microarray Library: Mosaic Plots

The mosaic module provides the definition of the mosaic class. See the bottom of the page for an example of how the class can be used.

Class Name: mosaic

Attributes

data: A data frame or matrix; the actual data used to construct the object.
samples: A clustering object, typically the result of a call to hclust.
genes: A clustering object, typically the result of a call to hclust.
sample.metric: A string describing the distance metric used to cluster the samples (columns) in the data frame.
gene.metric: A string describing the distance metric used to cluster the genes (rows) in the data frame.
name: A character string describing the object as a whole.

Methods

mosaic(data, sample.metric, gene.metric, usecor, center, name): This is the constructor for mosaic objects. The only required argument is data, which can be a matrix, a data frame, or any object with an as.data.frame method. If the optional argument center has a true value, then the rows of the data matrix will be shifted so the mean across each row is zero; by default, the center argument is false. If the optional argument usecor is true, then the rows of the data matrix will be rescaled to have variance one. By default, the usecor argument is false. The sample.metric and gene.metric are strings describing the distance metric to be used for clustering. These argument must be valid inputs to the distance.matrix function. By default, both metric arguments are "euclid", which means that the clustering algorithms will use Euclidean distance. The final optional argument, name, which defaults to the empty string, is a character string describing the object, which will be used to label plots.
plot(object, main, center, limits, sample.clust, show.labels, gene.classes): This method produces a plot of the mosaic object that is its first argument. All the remaining arguments are optional. The plot consists of two primary parts. The top part is a dendrogram of the clustered genes (rows) in the data of the object. The bottom part is a false color "mosaic" plot similar to those introduced by Mike Eisen. The columns in this image are sorted in the same order as the leaves in the dendrogram. By default, the rows in the image are ordered by the sample clustering that was computed when the mosaic object was constructed. You can override this ordering by supplying the results of a call to hclust as the optional sample.clust argument, in which case the alternate cluster order will be used. The optioal arguments center and limits control the display of the image. If the center argument has a true value, then each gene row of the data matrix will be centered to have mean zero for display purposes. If a limits argument is supplied, then it should be a vector with two values, like c(-4, 4). These values are used to truncate the data values for display purposes; most images can be improved by choosing appropriate limits. The optional main argument is used as a title for the entire plot; it defaults to the name attribute of the mosaic object. The other two arguments, gene.classes and show.labels provide the option to add additional pieces to the plot. The gene.classes argument should be a positive integer. If supplied, then colored bars will be added between the dendrogram and the mosaic image to highlight the number of selected clusters. If the show.labels argument is supplied as a positive integer, then it has the same effect for clustered samples, putting colored bars along the left edge of the image. You can also supply a vector as the show.labels argument. In this case, the length of the vector should equal the number of samples, and each entry in the vector should be an integer representing a color to use for that sample in the left edge color bar.
pltree(object, labels, colors, ...): Produce a dendrogram of the samples in the object, using the specified labels in the specified colors.

Description

An object of the mosaic class represents the results of two-way clustering similar to that used in the plots introduced by Mike Eisen. We provide the ability to cluster genes and samples using independent metrics, and also provide a fairly flexible set of tools for producing the final plot. At present, we only combine one dendrogram (for the genes) with a false color image, primarily because we don't know how to rotate dendrograms in S-Plus.

Auxiliary Functions

distance.matrix(data, metric)

This function takes a data frame or matrix (called data) and a string describing a distance metric and returns the distance matrix whose entries are the distances between columns in the data. Valid vaues for the metric argument are

euclid: Euclidean distance
pearson: (1 - Pearson.correlation)/2
pearson2: sqrt((1 - Pearson.correlation)/2)
absolute: 1 - |Pearson.correlation|
spearman: (1 - Spearman.rank.correlation)/2
weird: Euclidean distance between Pearson correlation vectors

Example

  graphsheet(num.image.colors=3, num.image.shades='48,48',
    image.color.table="0,255,0|0,0,0|255,0,0")
  n.sample <- 10
  n.gene <- 100
  faker <- matrix(rnorm(n.sample*n.gene), n.gene, n.sample)
  fake.mosaic <- mosaic(faker, 'pearson', 'spearman', name='pseudo')
  plot(fake.mosaic)
  plot(fake.mosaic, gene.classes=6, show.labels=7)