The Object Oriented Microarray Library in S-PLUS is a suite of object-oriented
programming modules written in S-PLUS for the analysis of microarray
experiments. The entire library can be compiled by running the script
microarrays.ssc
, which sources the remaining sections of
code, including all the built-in designs. There is a separate script
load-all-designs.ssc
listing the design source files.
A design
is a description of the genes
that have been spotted on a microarray. We store designs as a data
frame, in which each row corresponds to a spot on the array. The
number of columns varies depending on how much information we have
accumulated from the manufacturer about the material that was used to
produce the array.
Because the manufacturers provide descriptive information about the genes under many different names, most design objects are given another class designation in addition to that of a data frame. This additional class structure allows us to implement methods that will provide a common interface to the descriptive information we need. At present, there are two such methods:
The "CG4 Named Gene" microarray produced by the M.D. Anderson Cancer Genomics Core Laboratory contains 4800 spots, consisting of 2304 distinct genes spotted in duplicate, 48 positive controls, 48 negative controls, and 96 blanks.
Location
of the spot (in the form A1a1), the GenBank
Accession
number, and a character string giving a
Name
for the gene.Location2
Location2.Both of these objects are of class cg4list
.
CG8 Pathways Array
The "CG8 Pathways" microarray produced by the M.D. Anderson Cancer Genomics Core Laboratory contains 3702 spots, consisting of 1152 distinct genes spotted in duplicate, 96 positive controls, 96 negative controls, and 192 regularly spaced blanks. There are additional blank spots, usually near the ends of the subgrids.
Location
of the spot (in the form A1a1), the GenBank
Accession
number, a character string containing the
standard Symbol
for the gene, and a character string giving a
Name
for the gene.Location2
Location2, and the gene symbol is inexplicably
omitted. Note that only 1152 of the 1344 actually correspond to genes;
the others can probably be detected by applying is.na
to
the Accession
column.Both of these objects are of class cg8list
.
Clontech ATLAS Human Cancer 1.2 Microarray
The Human Cancer 1.2 is a commercial nylon microarray produced by Clontech. It contains 1185 spots, nine of which are housekeeping genes spotted below the main rectangular grid of spots.
Research Genetics produces a series of nylon microarrays containing different sets of genes. The typical configuration consists of two fields of eight grids each, where each grid contains 12 columns and 30 rows of spots. Thus, there are typically 5760 spots on the array. Within each grid, there are 12 control spots of total genomic DNA and 12 blank spots. Research Genetics provides a great deal of information (about 20 columns, but much of it is out of date) about the genes on the arrays. In July 2001, we updated that information for the GF200-GF205 arrays, and so the annotations contained here are probably better than those obtained directly from the company.
These seven objects are all of class rglist
.
The design information for the GeneFilters includes additional objects along with additional functions for processing them.
n
is an index representing a spot onthe
array (or a row in the data produced by ArrayVision); the output is
a list of the indices describing all spots in the same grid. The
sz
variable is optional; it defaults to a list
c(8, 2, 12, 30, 0, 2)
describing the grid geometry.extra
argument (which defaults to
0.5). This percentile is used to compute a maximum allowable
background on each grid. (The term "grid" is preferred by the
company; it is synonymous in our usage with "patch".)extra
argument (which defaults to
0.5). Values on each grid are then rescaled to adjust the mean of
the total genomic spots within the grid to equal 100.The NCI60 array design refers to the microarrays used by Ross et al. to study the NCI60 cell lines; this data set is publicly available and is one of the data sets being used for the 2001 CAMDA competition. Most of the arrays have 4 grids of size 50 by 50, for a total of 10000 spots. A few of the early arrays contain four grids of size 49 by 51, for a total of 9996 spots. All of the actual gens are in the same order within a grid, when read left-to-right and top-to-bottom. The additional spots are blank.