SamplePCA {ClassDiscovery} | R Documentation |

Perform principal components analysis on the samples (columns) from a microarray or proteomics experiment.

SamplePCA(data, splitter = 0, usecor = FALSE, center = TRUE) ## S4 method for signature 'SamplePCA, missing': plot(x, splitter=x@splitter, col, main='', which=1:2, ...)

`data` |
Either a data frame or matrix with numeric values or an
`ExpressionSet` as defined
in the BioConductor tools for analyzing microarray data. |

`splitter` |
If `data` is a data frame or matrix, then splitter
must be either a logical vector or a factor. If `data` is an
`ExpressionSet` , then `splitter` can be a character string that
names one of the factor columns in the associated
`phenoData` subobject. |

`center` |
A logical value; should the rows of the data matrix be centered first? |

`usecor` |
A logical value; should the rows of the data matrix be scaled to have standard deviation 1? |

`x` |
A `SamplePCA` object |

`col` |
A list of colors to represent each level of the
`splitter` in the plot. If this parameter is missing, the
function will select colors automatically. |

`main` |
A character string; the plot title |

`which` |
A numeric vector of length two specifying which two principal components should be included in the plot. |

`...` |
Additional graphical parameters for `plot` |

.

The main reason for developing the `SamplePCA`

class is that the
`princomp`

function is very inefficient when the number of
variables (in the microarray setting, genes) far exceeds the number of
observations (in the microarray setting, biological samples). The
`princomp`

function begins by computing the full covariance
matrix, which gets rather large in a study involving tens of thousands
of genes. The `SamplePCA`

class, by contrast, uses singular
value decomposition (`svd`

) on the original data matrix to
compute the principal components.

The `SamplePCA`

function constructs and returns an object of the
`SamplePCA`

class. We assume that the input data matrix has N
columns (of biological samples) and P rows (of genes).

The predict method returns a matrix whose size is the number of
columns in the input by the number of principal components.

Objects should be created using the `SamplePCA`

function. In the
simplest case, you simply pass in a data matrix and a logical vector,
`splitter`

, assigning classes to the columns, and the constructor
performs principal components analysis on the column. The
`splitter`

is ignored by the constructor and is simply saved to
be used by the plotting routines. If you omit the `splitter`

,
then no grouping structure is used in the plots.

If you pass `splitter`

as a factor instead of a logical vector,
then the plotting routine will distinguish all levels of the factor.
The code is likely to fail, however, if one of the levels of the
factor has zero representatives among the data columns.

As with the class comparison functions (see, for example,
`MultiTtest`

) that are part of OOMPA,
we can also perform PCA on
`ExpressionSet`

objects
from the BioConductor libraries. In this case, we pass in an
`ExpressionSet`

object along with a character string containing the
name of a factor to use for splitting the data.

`scores`

:- A
`matrix`

of size NxN, where N is the number of columns in the input, representing the projections of the input columns onto the first N principal components. `variances`

:- A
`numeric`

vector of length N; the amount of the total variance explained by each principal component. `components`

:- A
`matrix`

of size PxN (the same size as the input matrix) containing each of the first P principal components as columns. `splitter`

:- A logical vector or factor of length N classifying the coluimns into known groups.
`usecor`

:- A
`logical`

value; was the data standardized? `shift`

:- A
`numeric`

vector of length P; the mean vector of the input data, which is used for centering by the`predict`

method. `scale`

:- A
`numeric`

vector of length P; the standard deviation of the input data, which is used for scaling by the`predict`

method. `call`

:- An object of class
`call`

that records how the object was created.

- plot
`signature(x = SamplePCA, y = missing)`

: Plot the samples in a two-dimensional principal component space.- predict
`signature(object = SamplePCA)`

: Project new data into the principal component space.- screeplot
`signature(x = SamplePCA)`

: Produce a bar chart of the variances explained by each principal component.- summary
`signature(object = SamplePCA)`

: Write out a summary of the object.

Kevin R. Coombes <kcoombes@mdanderson.org>

# simulate datda from three different groups d1 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE) d2 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE) d3 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE) dd <- cbind(d1, d2, d3) kind <- factor(rep(c('red', 'green', 'blue'), each=10)) # perform PCA spc <- SamplePCA(dd, splitter=kind) # plot the results plot(spc, col=levels(kind)) # mark the group centers x1 <- predict(spc, matrix(apply(d1, 1, mean), ncol=1)) points(x1[1], x1[2], col='red', cex=2) x2 <- predict(spc, matrix(apply(d2, 1, mean), ncol=1)) points(x2[1], x2[2], col='green', cex=2) x3 <- predict(spc, matrix(apply(d3, 1, mean), ncol=1)) points(x3[1], x3[2], col='blue', cex=2) # check out the variances screeplot(spc) # cleanup rm(d1, d2, d3, dd,kind, spc, x1, x2, x3)

[Package *ClassDiscovery* version 2.5.0 Index]