MultiWilcoxonTest-class {ClassComparison} | R Documentation |
The MultiWilcoxonTest
class is used to perform
row-by-row Wilcoxon rank-sum tests on a data matrix. Significance
cutoffs are determined by the empirical Bayes method of Efron and
Tibshirani
MultiWilcoxonTest(data, classes, histsize = NULL) ## S4 method for signature 'MultiWilcoxonTest': summary(object, prior=1, significance=0.9, ...) ## S4 method for signature 'MultiWilcoxonTest': hist(x, xlab='Rank Sum', ylab='Prob(Different | Y)', main='', ...) ## S4 method for signature 'MultiWilcoxonTest, missing': plot(x, prior=1, significance=0.9, ylim=c(-0.5, 1), xlab='Rank Sum', ylab='Prob(Different | Y)', ...) ## S4 method for signature 'MultiWilcoxonTest': cutoffSignificant(object, prior, significance, ...) ## S4 method for signature 'MultiWilcoxonTest': selectSignificant(object, prior, significance, ...) ## S4 method for signature 'MultiWilcoxonTest': countSignificant(object, prior, significance, ...)
data |
Either a data frame or matrix with numeric values or an
exprSet as defined in the BioConductor tools for
analyzing microarray data. |
classes |
If data is a data frame or matrix, then classes
must be either a logical vector or a factor. If data is an
exprSet , then classes can be a character string that
names one of the factor columns in the associated
phenoData subobject. |
histsize |
An integer; the number of bins used for the hostogram
summarizing the Wilcoxon statistics. When NULL , each discrete
rank-sum value gets its own bin. |
object |
an object of the MultiWilcoxonTest class. |
x |
an object of the MultiWilcoxonTest class. |
xlab |
Label for the x axis |
ylab |
Label for the y axis |
ylim |
Plotting limits on the y=axis |
main |
Graph title |
prior |
Prior probability that an arbitrary gene is not differentially expressed, or that an arbitrary row does not yield a significant Wilcoxon rank-sum statistic. |
significance |
Desired level of posterior probability |
... |
Additional graphical parameters. |
See the paper by Efron and Tibshirani.
The standard methods summary
, hist
, and plot
return what you would expect.
The cutoffSignificant
method returns a list of two
integers. Rank-sum values msaller than the first value or larger than
the second value are statistically significant in the sense that their
posterior probability exceeds the specified significance
level
given the assumptions about the prior
probability of not being
significant.
The selectSignficant
method returns a vector of logical values
identifying the significant test results, and countSignificant
returns an integer counting the number of significant test results.
As usual, objects can be created by new
, but better methods are
available in the form of the MultiWilcoxonTest
function. The
inputs to this function are the same as those used for row-by-row
statistical tests throughout the ClassComparison package; a detailed
description can be found in the MultiTtest
class.
The constructor computes row-by-row Wilcoxon rank-sum statistics on
the input data
, comparing the two groups defined by the
classes
argument. It also estimates the observed and
theoretical (expected) density functions for the collection of
rank-sum statistics.
The additional input argument, histsize
is usually best left to
its default value. In certain pathological cases, we have found it
necessary to use fewer bins; one suspects that the underlying model
does not adequately capture the complexity of those situations.
rank.sum.statistics
:xvals
:theoretical.pdf
:xvals
.pdf
:xvals
.unravel
:xvals
.groups
:classes
.call
:call
representing the
function call that created the object. prior
probability of not being differentially expressed and
a given significance cutoff on the posterior probability, reports
the cutoffs and number of items in both tails of the distribution.COLOR.EXPECTED
and
COLOR.OBSERVED
.
prior
. Horixontal lines are added
at each specified significance
level for the posterior
probability.
Kevin R. Coombes <kcoombes@mdanderson.org>
Efron B, Tibshirani R: Empirical bayes methods and false discovery rates for microarrays. Genet Epidemiol 2002, 23: 70-86.
Pounds S, Morris SW. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics. 2003 Jul 1;19(10):1236-42.
Implementation is handled in part by the functions dwil
and rankSum
. The empirical Bayes results for
alternative tests (such as MultiTtest
) can be obtained
using the beta-uniform mixture model in the Bum
class.
ng <- 10000 ns <- 15 nd <- 200 fake.class <- factor(rep(c('A', 'B'), each=ns)) fake.data <- matrix(rnorm(ng*ns*2), nrow=ng, ncol=2*ns) fake.data[1:nd, 1:ns] <- fake.data[1:nd, 1:ns] + 2 fake.data[(nd+1):(2*nd), 1:ns] <- fake.data[(nd+1):(2*nd), 1:ns] - 2 a <- MultiWilcoxonTest(fake.data, fake.class) hist(a) plot(a) plot(a, prior=0.85) abline(h=0) cutoffSignificant(a, prior=0.85, signif=0.95) countSignificant(a, prior=0.85, signif=0.95) # cleanup rm(ng, ns, nd, fake.class, fake.data, a)