Object Oriented Microarray Library: Standard Reports

Along with the tools to analyze microarray data, we have some recommendations on how to report the information back to the experimenters. We break the analysis down into a series of steps. In the process, we identify certain critical points at which tables summarizing the results should be generated. At the bottom of this page, we have an example illustrating how our library helps generate these reports.

Quantification

The underlying assumption of the library is that quantification was carried out as a separate step before the data was brought into S-PLUS. We obviously don't need to report this data back, since it was already available from a different source. Instead, we need to provide tools to load the data into S-PLUS. The routines that load the data are described as constructors in the discussion of the complete channel and complete slide modules. These methods include particular routines designed for Clontech ATLAS Human Cancer 1.2 arrays, Research Genetics GeneFilters (GF200 through GF205), and model CG4 arrays produced by the Cancer Genomics Core Laboratory arrays at The UT M.D. Anderson Cancer Center.

Normalized Data

Before we can compare the results from different microarray experiments (or even from two channels in the same two-color fluorescence experiment), the data must undergo a certain amount of preprocessing. This processing includes background correction, some kind of normalization procedure, replacement of values below a threshold by a constant, and usually a log-transformation. In addition to receiving a description of this preprocessing, many investigators will want access to the set of normalized data in order to perform their own analyses.

Isotope Experiments

In the case of an experiment involving multiple targets labeled with radioactive isotopes, the normalized data is typically collected together as part of a channel set object. Using the as.data.frame method on this object, we can extract the normalized data in a form that can easily be exported as a tab-delimited text file, an Excel spreadsheet, or an Access database table.

Fluorescence Experiments

In the case of a single-slide fluorescence experiment, we typically carry out the preprocessing by extracting a slide object from a complete slide. Slide objects also have an as.data.frame method that can be used to export the normalized data.

Detection of Differentially Expressed Genes

In the basic experimental design we are using for these experiments, we are interested in comparing replicated values from two groups of experimental samples in order to find genes that are differentially expressed (and for which we can supply some statistical justification). Our standard method is to fit a smooth curve that expresses the standard deviation of (log) expression as a function of the mean (log) expression. We then use this curve to estimate the standard deviations as part of a traditional gene-by-gene t-test. Along the way, we typically produce test statistics that also indicate that certain genes are significantly more variable within one group than would be expected from this smooth curve, and hence we are less confident of the value of those t-statistics. All of these statistics should be reported back to the investigator.

Isotope Experiments

The statistics are computed in two steps. Objects of the two.group.stats class are constructed by converting a channel set to a data frame. The statistical object is used to construct an object of the two.groups class. The t-statistics are computed during construction. The results can be exported after converting the two groups object into a data frame.

Fluorescence Experiments

Because the CG4 arrays are designed with each spot printed in duplicate, we have an extra level of structure to exploit. In addition to the estimates of variability for the intensity corresponding to each replicate pair in each channel, we can also assess the reproducibility of the ratios computed at the paired spots. To do this, we use the replicate.ratio class, which is derived from the two groups class used for repeated isotope experiments. You can construct replicate ratio objects directly from complete slide objects, or you can use the analyze method (which generates lots of pretty pictures along the way). Then the as.data.frame method can be used to export the data.

Examples

Isotope Experiments

  # load the data
  is.normal <- c(rep(T, 3), rep(F, 3))
  filename <- c('N1.txt', 'N2.txt', 'N3.txt', 'T1.txt', 'T2.txt', 'T3.txt')
  varname <-  c(paste('N', 1:3), paste('T', 1:3))
  source <- data.frame(filename, varname)
  for (i in 1:6) { f.load.clontech(source[i,]) }

  # normalize, etc.
  rely <- reliable.spots(channel.set(varname))
  project <- extract(complete.channel.set(varname),
      standard.channel,
	ef = svol.extractor,
	nf=subset.normalize.transform, np=rely,
	tf=threshold.transform, tp=50,
	lf=log.transform)

  # report normalized data
  project.data <- as.data.frame(project)
  export.data('project.data', 'normalizedData.xls', 'EXCEL')

  # apply our standard test to find genes
  basic.stats <- two.group.stats(project.data, is.normal)
  test.stats  <- two.groups(basic.stats)

  # report the test statistics
  temp <- as.data.frame(test.stats)
  export.data('temp','testStatistics.xls', 'EXCEL')

Fluorescence Experiments

  # load the data
  N.vs.T <- f.load.cg4('quantfile.txt')

  # normalize, etc.
  experiment <- slide(N.vs.T, standard.channel,
	ef = svol.extractor,
	nf=subset.normalize.transform, np=faithful,
	tf=threshold.transform, tp=50,
	lf=log.transform)

  # report normalized data
  temp <- as.data.frame(experiment)
  export.data('temp', 'myReport.xls', 'EXCEL')

  # perform our standard analysis.
  result <- analyze(experiment, ef=standard.channel, tp=50)

  # report the test statistics
  temp <- as.data.frame(result)
  export.data('temp', 'testStatistics.xls', 'EXCEL')