Gene Shaving is a method for clustering groups of similarly behaving genes whose changes in expression are most tightly linked to observed biological changes. The basic method is similar to observed principal components (singular value decomposition, maximum eigenvalue, etc.) with a sequential twist: a canonical “gene vector” is identified based on the eigenvectors, and the genes are ranked according to their agreement with this vector. The worst fitting are then “shaved off” and a new canonical vector is identified and fit.
The GeneClust distribution is a denovo implementation of the Gene Shaving method. GeneClust consists of three components:
Before invoking the application, the user must create the data and output directories, and place the data file(s) to be analyzed in the data directory. (These directory names may be overridden by environment variables: see Environment Variables below.)
The data must be stored in a tab-separated (tsv) file. It may optionally contain an initial header line containing column names. The first column (of all rows) may optionally contain the row names.
The application GUI consists of four panels:
The information input by the user is checked for validity. Invalid input will cause the offending field to be highlighted (in black). Tool tips for numeric range information are provided.
To process data stored in a tab-separated (tsv) file, select the File Data tab in the data input panel and complete the FileName field in that panel. Pressing the Select… button will popup a file selection dialog to simplify choosing the correct filename.
Specify the desired Gene Shaving parameters using the fields in the Shaving Parameters panel. If Percent Supervision is zero, unsupervised shaving will be performed. Otherwise, the Filename field in this panel must contain a valid classification file for the data being analyzed. If Percent Supervision is 100, complete supervision will be performed, otherwise partial supervision.
After the appropriate parameters have been set, press the Shave button to import (or create) the data to be analyzed and obtain the specified number of clusters.
The frontend will create a backend R process, sent it commands to perform the requested shaving, and display a process monitor that will display informative messages about the progress of the analysis. When the process monitor displays S interpreter processing complete, press the End Simulation button to destroy the backend process. If the backend process does not complete successfully, press the End Simulation button at anytime to interrupt the analysis and destroy the backend process.
After the backend process completes successfully, the geneshaving results can be displayed or printed. Optionally, the original data matrix and an hierarchical clustering of that matrix (by genes or samples) may also be displayed or printed.
Note: In this version, these optional displays cannot be generated until the backend process has completed successfully.
To display the desired graphs, check the Geneshaving Clusters checkbox and any additional checkboxes in the Display Selection panel, then press the Display button. The frontend will invoke a backend R process to generate the desired displays. After viewing the displays, press the End Simulation button to destroy the graphs and the backend process. The graphs may be displayed as many times as desired by repeatedly pressing the Display button.
To print the desired graphs, check the appropriate checkboxes and press the Print button. The graphs will be output as encapsulated postscript figures to files with .eps extensions located in the output directory.
File Pulldown Menu
The File pulldown menu contains all the generic file handling options.
Help Pulldown Menu
The Help pulldown menu contains all the options providing basic assistance in using the application.
By default, GeneClust expects its input to reside in the data subdirectory and will store all results in the output subdirectory.