>
>
>
This project is archived and no longer maintained.
WFMM
hidden row | for table layout |
Description | Implements Markov Chain Monte Carlo (MCMC) Bayesian wavelet-based functional mixed models procedue |
Language | C++ |
Current version | 1.0 |
Platforms | Windows |
Status | Inactive |
Last updated | 2005-04-15 |
News | As of July/August 2016, this highly cited paper received enough citations to place it in the top 1% of the academic field of Mathematics based on a highly cited threshold for the field and publication year. |
Citation |
Morris JS, & Carroll RJ. (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 179-199. http://dx.doi.org/10.1111/j.1467-9868.2006.00539.x
|
WFMM : Wavelet-based Functional Mixed Models
The code implements the Markov Chain Monte Carlo (MCMC) Bayesian wavelet-based functional mixed models procedure described in Section 5 of the Morris and Carroll (2004) paper. The program itself runs in the windows command prompt interface. It takes in a matlab file as an input, and outputs posterior samples for many model quantities.
Packages
WFMM.exe: The main executable.
Various DLL files: Required to run WFMM.exe
Download
You can download the package here .
System Requirements
Requires at least Windows XP to run.
Documentation
We apologize that this initial documentation is incomplete. More
extensive documentation will be provided in the near future, along with
scripts for performing some of the most commonly used Bayesian
estimation, inference, and prediction procedures from the MCMC output
from this program. We also plan to develop an R interface for this
program, which we will also post when it is ready.
Sample program call (from DOS window)
wfmm c:\work\input.mat output.mat > log_file.log
The program requires an input file (“input.mat” in the call) that is a
matlab file containing the following objects:
- Y: N –by- T matrix, each row containing one of the observed
functions on an equally-spaced grid of length T
- model: Matlab structure indicating details of model. The following
elements are required to be included in the structure:
- X: N-by-p matrix containing the desired covariates for the p
fixed effect functions in the model.
- Hstar: integer indicating the number of levels of random effect
functions
- Z{h}, h=1, …, H: N-by-mh matrices containing covariates for
each set of random effect functions.
- The following three variables should just be included as given
here
- covq=2
- covs=2
- covT=1
These are artifacts of some exploration of different ways of doing the modeling – very soon these will be removed from the code, but for now they are still there and must be included.
- wavespecs: Matlab structure describing wavelet settings to be used. The following elements must be included:
- wavelet: text string indicating the wavelet basis to use.
Abbreviations for wavelet bases in Matlab toolbox are used. List
of available bases will be provided in future documentation.
- nlevels: integer indicating the number of levels of
decomposition
- boundary: boundary correction method assumed (‘reflection’ is
default)
- MCMCspecs: Matlab structure describing details of MCMC. The
following elements must be included:
- B= number of MCMC samples to obtain
- burnin= burn-in length
- thin= thinning parameter; e.g. if 10, then keep every 10 samples
in MCMC
- propvarTheta= multiple of var(MLE) to use in proposal variance
for variance components in step 2 of the MCMC, =1.5 by default
- nj_nosmooth= number of lowest frequency wavelet levels for
which we want a vague prior (no smoothing)
The following parameters are simply for numerical stability:
- minp= minimum value for any pij, 10-14 by default
- minT= minimum value for Tij, 1 by default
- bigT= value to use for Tij when vague prior desired (no smoothing)
- maxO = maximum odds ratio, 1020 by default (prevents overflow)
- minVC = minimum value of variance component, 10-6 by default
(prevents instability of variance components wandering near zero)
- VC0_thresh = minimum size for important variance component (=minVC
by default)
- delta_theta = multiple for prior on theta: “number of datasets of
information” in prior (see discussion in Morris, et al. 2003 JASA),
10-4 by default.
- thetaMOM_maxiter = maximum number of iterations in finding MOM
starting values for variance components, default 100
- thetaMOM_convcrit= convergence criteria for iterative procedure for
finding MOM starting values for variance components, default 10-3
Output
The output of the program is a Matlab data file (“output.mat” in the
sample call), containing the following Matlab objects, as well as a log
file.
- Y, model, wavespecs, MCMCspecs: as input
- D: matrix containing wavelet coefficients for observed data
- PI: matrix containing the pij estimated by the Empirical Bayes
procedure described in Section 4.4 of the paper.
- Tau: matrix containing the Tij by the Empirical Bayes procedure
described in Section 4.5 of Morris and Carroll (2004).
- theta_MOM: matrix containing method of moments starting values for
the wavelet-space variance components qjk and sjk in model (3).
- theta_MLE: matrix containing profile maximum likelihood starting
values for the wavelet-space variance components.
- se_Theta: estimate of the variance of theta_MLE, to use in
automatic proposal variances in Metropolis-Hastings procedure
described in step 2 of Section 5 in Morris and Carroll (2004)
- betans: matrix containing non-shrunken estimate of wavelet
coefficients for fixed effects conditioning on starting values of
variance components, given by equation (6) in Morris and Carroll
(2004).
- Vbetans: Matrix containing variance of these wavelet-spaced
estimates, given by equation (7) in Morris and Carroll (2004).
- alpha: Matrix containing starting values for shrinkages for wavelet
coefficients for fixed effect functions, which are their posterior
probabilities of being “nonzero”. Condition on theta_MLE for
variance components.
- betahat: betans*alpha; shrinkage starting values for betas.
- prior_Theta_a, prior_Theta_b: matrices containing the prior
hyperparameters for the inverse gamma distributions on the
wavelet-space variance components
- Wv: structure containing, for each wavelet coefficient, the
following statistics, using starting values of the variance
components for Sjk
- XvX=X’(Sjk)-1X
- XvZ=X’(Sjk)-1Z
- XvD=X’(Sjk)-1D
- ZvZ=Z’(Sjk)-1Z
- ZvD=Z’(Sjk)-1D
- dvd=diag(D’(Sjk)-1D)
- L1=det(Sjk)
- L2= (djk-X Bjk)’ (Sjk)-1 (djk-X Bjk)
- MCMC_wbeta: matrix containing MCMC posterior samples for wavelet
coefficients for fixed effects. Each row contains all B*ijk , i=1,
…, p, j=1, .., J, k=1, …, Kj , for one iteration of MCMC.
- MCMC_beta: matrix containing MCMC posterior samples for data-space
fixed effect functions. Each row contains Bij , i=1, …, p, j=1,
…, T, for one iteration of MCMC.
- MCMC_theta: matrix containing MCMC posterior samples for variance
components in wavelet space. Each row contains {qhjk, h=1, …, H,
sjk}, in that order, for j=1, …, J, k=1, …, Kj
- Paccept_newtheta: vector containing Metropolis-Hastings acceptance
probabilities for the set of variance components for each wavelet
coefficient, indexed by scale j and location k.
- ghat: p-by-T matrix containing the posterior mean for each fixed
effect function.
- Q05_ghat: p-by-T matrix containing the pointwise .05 quantile for
each fixed effect function, which is the lower bound for the 90%
posterior credible interval.
- Q95_ghat: p-by-T matrix containing the pointwise .95 quantile for
each fixed effect function, which is the upper bound for the 90%
posterior credible interval.
- This code has been compiled and used on Microsoft Windows systems.
In the near future, we will also test it on Linux, as well.
- The current interface assumes you create the input files and want to
post-process the output files in Matlab. We will write an R wrapper
for this program in the near future, and post on our web site when
it is ready.
- The current version of the code assumes:
- You want to estimate the shrinkage hyperparameters using the
empirical Bayes method.
- You want vague proper priors for the variance components, centered
at the starting values with information equivalent to delta_Theta
observations.
- The random effect functions are independent and identically
distributed, so P=R=I
Future versions of the code will allow the users to specify shrinkage
hyperparameters and priors for the variance components, and will also
allow other covariance structures on the random effect functions.
- This code yields MCMC samples for the quantities in the
wavelet-space model, (3) in Morris and Carroll (2004), plus MCMC
samples for the fixed effect functions B in the data space model
(2).
- MCMC samples of Qh and Si matrices can be obtained by applying the
2-D IDWT to the corresponding diagonal wavelet-space matrices. They
are not generated by default because they are frequently not of
direct interest, and their large size will cause memory issues in
large data sets. We are currently writing code to compute and store
the diagonal elements of these matrices in a more efficient manner.
These diagonal elements are useful for certain purposes we discuss
in future publications.
- MCMC samples of the random effect functions and their wavelet
coefficients are not calculated by default, again to save memory and
computing resources. We have matlab scripts to compute these that we
are in the process of converting over to C, and will post once they
are complete.
- Our scripts for post-processing the posterior samples from the MCMC
to do various types of Bayesian inference are in Matlab. We are in
the process of converting some of them over to C, at which time they
will be posted along with their documentation.
- It is possible to parallelize the MCMC step if parallel processing
is available – a parallel processing version of the code will be
posted in due time.
- For extremely large data sets, one may need to compute the memory
requirements of fitting their data and compare it with the
capabilities of their machine. Again, future documentation will give
guidelines here.
- There are numerous extensions and improvements in the works for this
method – stay tuned for updated code.
Reference
Morris, JS and Carroll, RJ (2004).
Wavelet-based functional mixed models, Technical Report, University of
Texas MD Anderson Cancer Center Department of Biostatistics and Applied
Mathematics, UTMDABTR-006-04.