Bum-class {ClassComparison} | R Documentation |
The Bum
class is used to fit a beta-uniform mixture model to a
set of p-values.
Bum(pvals, ...) ## S4 method for signature 'Bum': summary(object, tau=0.01, ...) ## S4 method for signature 'Bum': hist(x, res=100, xlab='P Values', main='', ...) ## S4 method for signature 'Bum': image(x, ...) ## S4 method for signature 'Bum': cutoffSignificant(object, alpha, by='FDR', ...) ## S4 method for signature 'Bum': selectSignificant(object, alpha, by='FDR', ...) ## S4 method for signature 'Bum': countSignificant(object, alpha, by='FDR', ...) likelihoodBum(object)
pvals |
A numeric vector containing values between 0 and 1 |
object |
A Bum object |
tau |
A real number between 0 and 1, representing a cutoff on the p-values. |
x |
A Bum object |
res |
A positive integer; the resolution at which to plot the fitted distribution curve. |
xlab |
Label for the x axis |
main |
Graph title |
alpha |
Either the false discovery rate (if by = 'FDR' ) or
the posterior probability (if by = 'EmpiricalBayes' ) |
by |
String denoting the method to use for determining
cutoffs. The chioces are 'FDR', 'FalseDiscovery', or
'EmpiricalBayes'. Since the test is implemented with
match.arg , unique abbreviations also work. |
... |
All methods are defined to accept additional arguments in
order to allow flexibility in designing derived classes. The usual
graphical parameters can be supplied to hist and image . |
The BUM method was introduced by Stan Pounds and Steve Morris, although it was simultaneously discovered by several other researchers. It is generally applicable to any analysis of microarray or proteomics data that performs a separate statistical hypothesis test for each gene or protein, where each test produces a p-value that would be valid if the analyst were only performing one statistical test. When performing thousands of statistical tests, however, those p-values no longer have the same interpretation as Type I error rates. The idea behind BUM is that, under the null hypothesis that none of the genes or proteins is interesting, the expected distribution of the set of p-values is uniform. By contrast, if some of the genes are interesting, then we should see an overabundance of small p-values (or a spike in the histogram near zero). We can model the alternative hypothesis with a beta distribution, and view the set of all p-values as a mixture distribution.
Fitting the BUM model is straightforward, using a nonlinear optimizer to compute the maximum likelihood parameters. After the model has been fit, one can easily determine cutoffs on the p-values that correspond to desired false discovery rates. Alternatively, the original Pounds and Morris paper shows that their results can be reinterpreted to recover the empirical Bayes method introduced by Efron and Tibshirani. Thus, one can also determine cutoffs by specifying a desired posterior probability of signficance.
Graphical functions (hist
and image
) invisbly return the
object on which they were invoked.
The cutoffSignficant
method returns a real number between zero
and one. P-values below this cutoff are considered statistically
significant at either the specified false discovery rate or at the
specified posterior probability.
The selectSignficant
method returns a vector of logical values
whose length is equal to the length of the vector of p-values that was
used to construct the Bum
object. True values in the return
vector mark the statistically signficant p-values.
The countSignificant
method returns an integer, the number of
statistically significant p-values.
The summary
method returns an object of class
BumSummary
.
Although objects can be created directly using new
, the most
common usage will be to pass a vector of p-values to the
Bum
function.
pvals
:ahat
:lhat
:pihat
:tau
, computes estimates of the fraction of true
positives (TP), false negatives (FN), false positives (FP), and
true negatives (TN).COLOR.EXPECTED
and
COLOR.OBSERVED
.
hist
; (2) a plot of cutoffs against
the desired false discovery rate; (3) a plot of cutoffs against
the posterior probability of coming from the beat component; and
(4) an ROC curve.by = 'FDR'
or by = 'FalseDiscovery'
) or
by the posterior probability (when by = 'EmpiricalBayes'
)cutoffSignificant
to determine a logical vector that
indicates which of the p-values are significant.selectSignificant
to count the number of significant
p-values.Kevin R. Coombes <kcoombes@mdanderson.org>
Pounds S, Morris SW. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics. 2003 Jul 1;19(10):1236-42.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc B, 1995; 57: 289-300.
Efron B, Tibshirani R: Empirical bayes methods and false discovery rates for microarrays. Genet Epidemiol 2002, 23: 70-86.
Two classes that produce lists of p-values that can (and often
should) be analyzed using BUM are MultiTtest
and
MultiLinearModel
. Also see BumSummary
.
fake.data <- c(runif(700), rbeta(300, 0.3, 1)) a <- Bum(fake.data) hist(a, res=200) alpha <- (1:25)/100 plot(alpha, cutoffSignificant(a, alpha, by='FDR'), xlab='Desired False Discovery Rate', type='l', main='FDR Control', ylab='Significant P Value') GAMMA <- 5*(10:19)/100 plot(GAMMA, cutoffSignificant(a, GAMMA, by='EmpiricalBayes'), ylab='Significant P Value', type='l', main='Empirical Bayes', xlab='Posterior Probability') b <- summary(a, (0:100)/100) be <- b@estimates sens <- be$TP/(be$TP+be$FN) spec <- be$TN/(be$TN+be$FP) plot(1-spec, sens, type='l', xlim=c(0,1), ylim=c(0,1), main='ROC Curve') points(1-spec, sens) abline(0,1) image(a) countSignificant(a, 0.05, by='FDR') countSignificant(a, 0.99, by='Emp') #cleanup rm(a, b, be, alpha, GAMMA, sens, spec, fake.data)