Object Oriented Microarray Library: Total Number of Misclassifications

The tnom module provides the definition of the tnom class. See the bottom of the page for an example of how the class can be used.

Class Name: tnom

Attributes

data: A vector, each element of which represents the minimum number of misclassifications made by an optimal split of the samples using the expression values of this gene.
nc: A scalar, the number of columns or samples.
nr: A scalar, the number of rows or genes.
classifier: A logical vector that separates the samples into two distinct types.
source: A character string represnting the name of the data.

Methods

tnom(zm, zclass): The constructor requires a data frame, zm, whose rows are genes and whose columns are samples, and a logical vector, zclass, to classify the samples into two types. The constructor performs a TNoM analysis (as defined by Yakhini and Ben-Dor) to determine the minimum number of classifications of the samples using each possible split along the expression values of each gene. (Because this process can be time-consuming, the constructor prints brief progress reports as it goes.)
summary: Returns a vector whose size is one-half the number of samples, showing the number of genes at each of those misclassification levels. (Note that you cannot misclassify more than half of the samples, since reversing the labels on the classes then gets more than half of them correct.)

Functions

simulate.genes.tnom(object): This function takes a tnom object as its argument. On a gene-by-gene basis, it scrambles the classifying labels on the columns (samples) and recomputes the total number of misclassifications with these random labels. It returns the number of genes at each misclassification level, as returned by the summary method.
scramble.samples.tnom(data, classifier): This function takes a data frame and a logical vector as its arguments, just as in the tnom constructor. The function then scrambles the classifying labels on the columns (samples) and computes the total number of misclassifications with these random labels. It returns the number of genes at each misclassification level, as returned by the summary method.

Description

An object of the tnom class represents the reuslt of a preliminary TNoM analysis. This method was introduced by Yakhini and Ben-Dor, and was used by Bittner et al. in their study of melanoma. The underlying idea is that, for each gene, we can order the samples in increasing order of expression. At the gap between each expression level, we can split the data and ask how many samples are misclassified by such a split. The smallest number of misclassifications for this gene is a quality measure that describes how well its expression levels (when converted to ranks) matches the actual classification. A key additional step is to determine how many genes there are at ach misclassification level. In particular, one would like to know that the number of genes with only a few misclassifications is greater than would be expected by random chance.

Example

  n.genes <- 200
  n.samples <- 10
  n.sim <- 10

  bogus <- matrix(rnorm(n.samples*n.genes, 0, 3), ncol=n.samples)
  splitter <- rep(F, n.samples)
  splitter[sample(1:n.samples, trunc(n.samples/2))] <- T

  tn <- tnom(bogus, splitter)
  temp <- matrix(0, n.sim, length(summary(tn)))
  for (i in 1:n.sim) {
	print(i)
	temp[i,] <- simulate.genes.tnom(tn)
  }
  ct <- data.frame(t(apply(temp, 1, cumsum)))
  fakir <- apply(ct, 2, mean)
  dex <- 0:(length(fakir)-1)
  scram <- scramble.samples.tnom(bogus, splitter)
  obs <- cumsum(summary(tn))
  scr <- cumsum(scram)

  plot(dex, fakir, type='n',
	xlab='Maximum Number of Misclassifications', ylab='Number of Genes')
  points(dex, fakir, type='b', col=6, pch=1)
  points(dex, obs, type='b', col=8, pch=16)
  points(dex, scr, type='b', col=4, pch=17)
  title(paste('TNoM', bogus$name))
  legend(3, 50, c('observed', 'expected', 'scrambled'),
	col=c(8, 6, 4), marks=c(16, 1, 17))