R: Fit Dilution Curves to Protein Lysate Series

RPPAFit {SuperCurve}

R Documentation

Fit Dilution Curves to Protein Lysate Series

Description

RPPAFit fits a four-parameter logistic model to the dilution series in a reverse-phase protein array experiment. Individual sample concentrations are estimated by matching individual sample dilution series to the overall logistic response for the slide.

Usage

RPPAFit(rppa, design, measure, xform=function(x) x,
    method = c("pure", "mixed", "quantiles", "rlm"),
    ci = FALSE, ignoreNegative = TRUE, bayesian = FALSE,
    trace = FALSE, verbose = FALSE, veryVerbose = FALSE,
    warnLevel = 0)

Arguments

`rppa`	An `RPPA` object containing the raw data to be fit.
`design`	A `RPPADesign` object describing the layout of the array.
`measure`	A character string identifying the column of the raw RPPA data that should be used to fit to the model.
`xform`	(Experimental) A function that takes a single input vector and returns a single output vector of the same length. The `measure` column is transformed using this function before fitting the model. NOT YET IMPLEMENTED.
`method`	optional parameter specifying the method for fitting the parameters `alpha` and `beta`. Default method is `pure`, which simply uses the optimal fit based on nonlinear least squares. Setting `method` to `mixed` uses `nls` to fit the three general model parametrs, but uses `rlm` to fit the sample-specific parameters. Setting `method` to `quantiles` uses the 5th and 95th quantiles from the raw data. Setting `method` to `rlm` tries to refit the values (afer an appropriate transformation) with a robust linear model.
`ci`	A logical value: if TRUE, then compute 90% confidence intervals on the concentration estimates.
`ignoreNegative`	A logical value: if TRUE, then negative values are converted to NA before fitting the model.
`bayesian`	A logical value: if TRUE, we use bayesian methods to estimate per sample values of the lower bound alpha.
`trace`	this is passed to nls in the bayesian portion of the routine.
`verbose`	a logical value; if TRUE, the function prints updates while it is fitting the data.
`veryVerbose`	a logical value; if TRUE, then the function prints voluminous updates as it is fitting each individual dilution series.
`warnLevel`	used to set the `warn` option before calling `rlm`. Since this is wrapped in a `try` function, it won't cause failure but will give us a chance to figure out which dilution series are failing. Setting `warnLevel` to two or greater may change the values returned by the function.

Details

The basic mathematical model is given by

Y = α + β*g(gamma*(X+delta_i)),

where Y is the observed intensity and X is the designed dilution step. The heart of the model is the function g(x) = e^x/(1+e^x), which is the inverse of the logistic function

f(x) = log(p/(1-p)).

By fitting a joint model, we assume that the parameters α, β, and gamma are the same for all dilution series on the array. The real point of the model, however, is to be able to draw inferences on the delta_i, which represent the (log) concentrations of the protein present in different dilution series.

As the first step in fitting the model, we get crude estimates of the parameters α and β by computing the min and max of the observed intensities Y. We then perform a logistic transformation, working with the values W = f((Y - α)/β). We then compute an initial estimate of gamma as the median (over all dilution series) of the slope of a robust linear fit to W as a function of the dilution steps X. Initial estimates of the individual delta_i are also computed robustly, conditional on the previously estimated parameters.

The next step depends on which method has been specified for model fitting. If method="pure" or method="mixed", then we use the non-linear least squares function nls. Conditional on the current estimates of the delta_i, we use nls to update the estimates of the other three parameters. Then, conditional on the updated values of α, β, and gamma, we update the estimates of the delta_i one dilution series at a time. The update uses nls when method="pure" and uses rlm when method="mixed".

If method='quantiles', then we retain quantile estimates of α and β. In this case, we first use nls to update the value of gamma and then, conditional on that estimate, update the delta_i.

If method="rlm", we first follow the procedure described for method='nls'. We follow this by trying to refit the estimates of α and β using a robust linear model with the rlm function from the MASS package. This computation is peformed conditionally on the estimates of \gamma and \delta_i, in which case the observed intensities Y are linear in the sigmoid-transformed dilution steps X.

The bayesian option alters the model by assuming that the baseline level α can be different for each dilution series. The globally estimated α is used as a strong prior, and the individual estimates of alpha are shrunk toward the global value. This idea is motivated by the possibility that background levels might be different on different parts of the reverse phase protein array.

If the ci argument is set to TRUE, then the function also computes confidence intervals around the estimates of the log concentration. Since this step can be time-consuming, it is not performed by default. Moreover, confidence intervals can be computed after the main model is fit and evaluated, using the getConfidenceInterval function; see its documentation for details on how the intervals are estimated.

Value

This function constructs and returns an object of the RPPAFit class.

Author(s)

Kevin R. Coombes <kcoombes@mdanderson.org>

References

KRC

Examples

path <- system.file("rppaTumorData", package="SuperCurve")
erk2 <- RPPA("ERK2.txt", path=path)
design <- RPPADesign(erk2, grouping="blockSample",
                     controls=list("neg con", "pos con"))
fit.nls <- RPPAFit(erk2, design, "Mean.Net")
summary(fit.nls)
coef(fit.nls)

[Package SuperCurve version 0.931 Index]