GeneClust is a user-friendly implementation of the Geneshaving methodology for the exploratory analysis of gene expression microarray data. The development of GeneClust was motivated by surging interest to search for interpretable biological structure in gene expression microarray data.
GeneClust includes two basic functions:
Geneclust JS is a Javascript implementation of Geneclust that runs in your browser and is suitable for the analysis of relatively small data sets (say about 100 samples and a few thousand genes). A relatively modern browser is required (recent versions of Chrome and Firefox are known to work).
Geneclust is also available as a native executable for download and installation on your local computer. The most recent versions of the package are available in source code form for the R programming environment. Older packages for the S-plus platform are still available.
The latest version for S-plus is 1.0b11, available as of 28 March 2002. Here are some links to download various binary packages. The source package for the S-plus version is not publicly available at this time; contact Dr. Kim-Anh Do for private access to the source package. The following are the supported platforms.
The current R version is available only as a source code package. Instructions are provided for compiling the source code for your platform. The package was most recently installed and tested under 64-bit Gentoo Linux, Java version 1.6.0, and R version 2.10.1, but any recent version of R, Java, and Linux should work. Other platforms may work provided you first install the necessary software development components.
UNIX S-plus 5.1 or 6.0 JRESE 1.3.1 or later Win2000 S-plus 2000 or 6.0
In summary, the installation process will create a directory hierarchy rooted at the directory specified. The major components of that hierarchy are an executable for launching the package, a Java application for running the user interface, R/S-plus code for loading data and generating graphical output within R/S-Plus, helper applications and dynamically loaded libraries, documentation, and examples.
More specifically, the directory hierarchy consists of the following. Note that the hierarchy described is for a typical installation of the R version on a Linux/Unix platform and may be slightly different for the S-plus version and/or the Windows platform.
The software is launched by running this shell script. The script initializes the necessary environment variables that specify the locations of the other components of the package and invokes the Java application front end with the appropriate parameters.
This file contains default definitions for environment variables needed by the application. Individual users can override the defaults specified in this file by placing them in a ‘.geneclustrc’ file in either their home directory or the directory in which the application is invoked.
This directory contains a basic help page in the form of an HTML document (‘GeneClust.html’) and supporting images and style sheets. It also contains the files ‘COPYRIGHTS’ and ‘LICENSE’ which govern your rights to copy and use this software.
This directory contains data and classification files for several example data sets.
This is the front-end java application that controls the user experience. It requires numerous environment variables to be set to work correctly and thus should not be invoked directly by the user.
This directory contains R/S-plus source code for reading data and generating graphical output.
This directory contains a helper application (‘pty’) for running the R application as a back end process controlled by the front end Java application, and a dynamically loaded library (’S.so’) for performing the most computationally intensive parts of the geneshaving algorithm.
The following generic instructions describe how to install the R version of GeneClust. Where command lines are given, we have used Ubuntu 10.10 for specificity. For other platforms please use an equivalent command where required. For help, please see our support forum.
On Ubuntu 10.10 for example:
> sudo apt-get libgsl0ldbl libgsl0-dev libatlas-dev libatlas3gf-baseYou will also need to install R, a C compiler, and the Java runtime environment (JRE) for your platform.
> tar xvzf ./geneclust-1.0.1.tgz > cd geneclust-1.0.1
> sudo mkdir “some location”
EITHER:
Set global environmental variable $INSTALLDIR to the location you want to install GENECLUST to:
> export INSTALLDIR=some existing locationOR:
Edit line 38 and put that location there instead.
> vi ./INSTALL
> ./INSTALLIf writing to the installation directory requires root privileges, one way to accomplish this is:
> sudo ./INSTALLTo satisfy yourself that the script will not perform any unwanted actions, please read the INSTALL script extremely carefully before doing so. Safer options are to temporarily change the permissions on the directory for the duration of the install or to use a sandbox.
We no longer have access to the platforms that can run the S-plus version, so the best advice we can give is:
Good luck.
GeneClust for S-Plus User’s Guide
Taken from our laptop running RedHat 7.1 and S-plus 6.0.
MD Anderson Cancer Center Department of Biostatistics (“MDACC”) hereby grant you a non-exclusive, perpetual, irrevocable right to use the software package.
Title, ownership rights, and intellectual property rights to the software package and all copies thereof remain with MDACC. The software is copyrighted and is protected by United States copyright laws and international treaty provisions. Except as expressly provided herein, no express or implied right are granted under any patents, copyrights, trademarks or trade secret information.
Except as specifically permitted in writing by MDACC, you shall not have the right to remove of any copyright notice or any proprietary trade or service marks or notices of the owners from the software package or any related documentation.
Derivative works must retain MDACC copyright notices.
GeneClust is copyright by Kim-Anh Do, Rumiana Nikolova, Paul Roebuck,and Bradley Broom.
Other product and company names herein may be trademarks of their respective owners.
Other copyrights given in the files are: java/src/edu/stanford/ejalbert/BrowserLauncher.java
Copyright 1999 by Eric Albert (ejalbert@cs.stanford.edu)
Trevor Hastie, Robert Tibshirani, Michael Eisen, Patrick Brown, Doug Ross, Uwe Scherf, John Weinstein, Ash Alizadeh, Louis Staudt and David Botstein (2000). Gene Shaving: a New Class of Clustering Methods for Expression Arrays (HTML) Technical Report, Department of Statistics, Stanford University.
Robert Tibshirani, Trevor Hastie, Mike Eisen, Doug Ross, David Botstein and Patrick Brown (1999). Clustering Methods for the Analysis of DNA Microarray Data (PS) Technical Report, Department of Statistics, Stanford University.
Laura Lazzeroni and Art Owen (2000). Plaid Model for Gene Expression Data (PS, PDF) Technical Report, Department of Statistics, Stanford University.