R: The MultiWilcoxonTest Class

MultiWilcoxonTest-class {ClassComparison}

R Documentation

The MultiWilcoxonTest Class

Description

The MultiWilcoxonTest class is used to perform row-by-row Wilcoxon rank-sum tests on a data matrix. Significance cutoffs are determined by the empirical Bayes method of Efron and Tibshirani

Usage

MultiWilcoxonTest(data, classes, histsize = NULL)
## S4 method for signature 'MultiWilcoxonTest':
summary(object, prior=1, significance=0.9, ...)
## S4 method for signature 'MultiWilcoxonTest':
hist(x, xlab='Rank Sum',
 ylab='Prob(Different | Y)', main='', ...)
## S4 method for signature 'MultiWilcoxonTest, missing':
plot(x, prior=1, significance=0.9,
 ylim=c(-0.5, 1), xlab='Rank Sum', ylab='Prob(Different | Y)', ...)
## S4 method for signature 'MultiWilcoxonTest':
cutoffSignificant(object, prior, significance, ...)
## S4 method for signature 'MultiWilcoxonTest':
selectSignificant(object, prior, significance, ...)
## S4 method for signature 'MultiWilcoxonTest':
countSignificant(object, prior, significance, ...)

Arguments

`data`	Either a data frame or matrix with numeric values or an `exprSet` as defined in the BioConductor tools for analyzing microarray data.
`classes`	If `data` is a data frame or matrix, then classes must be either a logical vector or a factor. If `data` is an `exprSet`, then `classes` can be a character string that names one of the factor columns in the associated `phenoData` subobject.
`histsize`	An integer; the number of bins used for the hostogram summarizing the Wilcoxon statistics. When `NULL`, each discrete rank-sum value gets its own bin.
`object`	an object of the `MultiWilcoxonTest` class.
`x`	an object of the `MultiWilcoxonTest` class.
`xlab`	Label for the x axis
`ylab`	Label for the y axis
`ylim`	Plotting limits on the y=axis
`main`	Graph title
`prior`	Prior probability that an arbitrary gene is not differentially expressed, or that an arbitrary row does not yield a significant Wilcoxon rank-sum statistic.
`significance`	Desired level of posterior probability
`...`	Additional graphical parameters.

Details

See the paper by Efron and Tibshirani.

Value

The standard methods summary, hist, and plot return what you would expect.
The cutoffSignificant method returns a list of two integers. Rank-sum values msaller than the first value or larger than the second value are statistically significant in the sense that their posterior probability exceeds the specified significance level given the assumptions about the prior probability of not being significant.
The selectSignficant method returns a vector of logical values identifying the significant test results, and countSignificant returns an integer counting the number of significant test results.

Creating Objects

As usual, objects can be created by new, but better methods are available in the form of the MultiWilcoxonTest function. The inputs to this function are the same as those used for row-by-row statistical tests throughout the ClassComparison package; a detailed description can be found in the MultiTtest class.

The constructor computes row-by-row Wilcoxon rank-sum statistics on the input data, comparing the two groups defined by the classes argument. It also estimates the observed and theoretical (expected) density functions for the collection of rank-sum statistics.

The additional input argument, histsize is usually best left to its default value. In certain pathological cases, we have found it necessary to use fewer bins; one suspects that the underlying model does not adequately capture the complexity of those situations.

Slots

rank.sum.statistics:: A numeric vector containing the computed rank-sum statistics.
xvals:: A numeric vector, best thought of as the vector of possible rank-sum statistics given the sizes of the two groups.
theoretical.pdf:: A numeric vector containing the theoretical density function evaluated at the points of xvals.
pdf:: A numeric vector containing the empirical density function computed at the points of xvals.
unravel:: A numeric vector containing a smoothed estimate (by Poisson regression using B-splines) of the empirical density function evaluated at xvals.
groups:: A vector containing the names of the groups defined by classes.
call:: An object of class call representing the function call that created the object.

Methods

summary(object, prior=1, significance=0.9, ...): Write out a summary of the object. For a given value of the prior probability of not being differentially expressed and a given significance cutoff on the posterior probability, reports the cutoffs and number of items in both tails of the distribution.
hist(x, xlab='Rank Sum', ylab='Prob(Different|Y)', main='', ...): Plot a histogram of the rank-sum statistics, with overlaid curves represnting the expected and observed distributions. Colors of the curves are controlled by COLOR.EXPECTED and COLOR.OBSERVED.
plot(x, prior=1, significance=0.9, ylim=c(-0.5, 1), xlab='Rank Sum', ylab='Prob(Different | Y)', ...): Plots the posterior probability of being differentially expressed for given values of the prior. Horixontal lines are added at each specified significance level for the posterior probability.
cutoffSignificant(object, prior, significance, ...): Determine cutoffs on the rank-sum statistic at the desired significance level.
selectSignificant(object, prior, significance, ...): Compute a logical vector for selecting significant test results.
countSignificant(object, prior, significance, ...): Count the number of significant test results.

Author(s)

Kevin R. Coombes <kcoombes@mdanderson.org>

References

Efron B, Tibshirani R: Empirical bayes methods and false discovery rates for microarrays. Genet Epidemiol 2002, 23: 70-86.

Pounds S, Morris SW. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics. 2003 Jul 1;19(10):1236-42.

Examples

ng <- 10000
ns <- 15
nd <- 200
fake.class <- factor(rep(c('A', 'B'), each=ns))
fake.data <- matrix(rnorm(ng*ns*2), nrow=ng, ncol=2*ns)
fake.data[1:nd, 1:ns] <- fake.data[1:nd, 1:ns] + 2
fake.data[(nd+1):(2*nd), 1:ns] <- fake.data[(nd+1):(2*nd), 1:ns] - 2

a <- MultiWilcoxonTest(fake.data, fake.class)
hist(a)
plot(a)
plot(a, prior=0.85)
abline(h=0)

cutoffSignificant(a, prior=0.85, signif=0.95)
countSignificant(a, prior=0.85, signif=0.95)

# cleanup
rm(ng, ns, nd, fake.class, fake.data, a)

[Package ClassComparison version 1.1 Index]