Department of Bioinformatics and Computational Biology


From MD Anderson Bioinformatics
Jump to: navigation, search


Description GeneClust is a tool used for exploratory analysis of gene expression microarray data. Available implementations include S-plus and R packages for installation on your machine, and a Javascript version for in-browser analysis of smaller datasets.
Development Information
Language Javascript, or R/S-plus, C, and Java
Current Version JS (Javascript), 1.0.1 (R), 1.0b11 (S-plus)
Platforms Linux, Unix, Microsoft Windows
Status Active
Last Updated 2012-07-10
Citations Do K-A, Broom B, Wen S. GeneClust. In: The analysis of gene expression data: Methods and software. Ed(s) Parmigiani G, Garrett E, Irizarry RA, Zeger SL. Springer: New York, 342-61, 2003. PMID: 0-387-95577-1
Help and Support
Contact Kim-Anh Do, PhD (for statistical queries), Bradley Broom, PhD (for software queries)

GeneClust is a user-friendly implementation of the Geneshaving methodology for the exploratory analysis of gene expression microarray data. The development of GeneClust was motivated by surging interest to search for interpretable biological structure in gene expression microarray data.

GeneClust includes two basic functions:

1. Implementation of clustering methods
  • Hierarchical clustering (R/S-plus version only)
  • Gene shaving
2. Simulation to assess performance of the above methods (R/S-plus version only).

A user is able to directly interact with the program, visualize the data and resulting clusters, and control the generation of numerous intermediate output results.


Browser Version

Geneclust JS is a Javascript implementation of Geneclust that runs in your browser and is suitable for the analysis of relatively small data sets (say about 100 samples and a few thousand genes). A relatively modern browser is required (recent versions of Chrome and Firefox are known to work).

Downloadable Versions

Geneclust is also available as a native executable for download and installation on your local computer. The most recent versions of the package are available in source code form for the R programming environment. Older packages for the S-plus platform are still available.

R Version

S-plus Version

The latest version for S-plus is 1.0b11, available as of 28 March 2002. Here are some links to download various binary packages. The source package for the S-plus version is not publicly available at this time; contact Dr. Kim-Anh Do for private access to the source package. The following are the supported platforms.

Sun Solaris 2.8
Compaq Tru64 4.0f
RedHat Linux 7.1 (IA32)
Win2000 (R/.NET version)

System Requirements

R Version

The current R version is available only as a source code package. Instructions are provided for compiling the source code for your platform. The package was most recently installed and tested under 64-bit Gentoo Linux, Java version 1.6.0, and R version 2.10.1, but any recent version of R, Java, and Linux should work. Other platforms may work provided you first install the necessary software development components.

S-plus Version

  • UNIX
    • S-plus 5.1 or 6.0
    • JRESE 1.3.1 or later
  • Win2000
    • S-plus 2000 or 6.0

Installation Overview

In summary, the installation process will create a directory hierarchy rooted at the directory specified. The major components of that hierarchy are an executable for launching the package, a Java application for running the user interface, R/S-plus code for loading data and generating graphical output within R/S-Plus, helper applications and dynamically loaded libraries, documentation, and examples.

More specifically, the directory hierarchy consists of the following. Note that the hierarchy described is for a typical installation of the R version on a Linux/Unix platform and may be slightly different for the S-plus version and/or the Windows platform.

The software is launched by running this shell script. The script initializes the necessary environment variables that specify the locations of the other components of the package and invokes the Java application front end with the appropriate parameters.
This file contains default definitions for environment variables needed by the application. Individual users can override the defaults specified in this file by placing them in a '.geneclustrc' file in either their home directory or the directory in which the application is invoked.
This directory contains a basic help page in the form of an HTML document ('GeneClust.html') and supporting images and style sheets. It also contains the files 'COPYRIGHTS' and 'LICENSE' which govern your rights to copy and use this software.
This directory contains data and classification files for several example data sets.
This is the front-end java application that controls the user experience. It requires numerous environment variables to be set to work correctly and thus should not be invoked directly by the user.
This directory contains R/S-plus source code for reading data and generating graphical output.
This directory contains a helper application ('pty') for running the R application as a back end process controlled by the front end Java application, and a dynamically loaded library ('') for performing the most computationally intensive parts of the geneshaving algorithm.

Installation Instructions

R Version

The following generic instructions describe how to install the R version of GeneClust. Where command lines are given, we have used Ubuntu 10.10 for specificity. For other platforms please use an equivalent command where required. For help, please see our support forum.

1. Install the prerequisite software

The R version of the software requires the GNU Scientific Library. Please download and install this software and its dependencies for your platform.

On Ubuntu 10.10 for example:

> sudo apt-get libgsl0ldbl libgsl0-dev libatlas-dev libatlas3gf-base

You will also need to install R, a C compiler, and the Java runtime environment (JRE) for your platform.

2. Download the package from the Download section of this page.
3. Extract the file
> tar xvzf ./geneclust-1.0.1.tgz
> cd geneclust-1.0.1
4. Create the installation directory

If the directory into which you will install the software does not yet exist, create it:

> sudo mkdir "some location"
5. Specify the installation directory

Tell the software the location of the installation directory.


Set global environmental variable $INSTALLDIR to the location you want to install GENECLUST to:

> export INSTALLDIR=some existing location


Edit line 38 and put that location there instead.

> vi ./INSTALL
6. Run the install script

If writing to the installation directory requires root privileges, one way to accomplish this is:

> sudo ./INSTALL

To satisfy yourself that the script will not perform any unwanted actions, please read the INSTALL script extremely carefully before doing so. Safer options are to temporarily change the permissions on the directory for the duration of the install or to use a sandbox.

S-plus Version

We no longer have access to the platforms that can run the S-plus version, so the best advice we can give is:

1. Find a machine running a suitably old version of the operating system, Java, and S-plus.
2. Download the package from the Download section of this page.
3. Extract the file using a suitable utility for your platform.
4. Do your best to install it.
5. Post in our forums and let us know how you did it and/or what problems you've encountered and need help on.

Good luck.


Javascript Version

GeneClust JS User's Guide

R Version

GeneClust for R User's Guide

S-plus Version

GeneClust for S-Plus User's Guide

Screenshots (S-plus version)

Taken from our laptop running RedHat 7.1 and S-plus 6.0. Multiple resolutions for all images are available:

Release History

  • Geneclust JS (July 2012). First Javascript release.
  • Version 1.0.1. (December 2009). Bug fix release.
  • Version 1.0.0. (August 2006). First open source release for R.
  • Version 1.0 beta 11. (March 2002). Last closed source S-plus release.


For Frequently Asked Questions, Bug Reports, and other concerns, please visit the forum at this link


MD Anderson Cancer Center Department of Biostatistics ("MDACC") hereby grant you a non-exclusive, perpetual, irrevocable right to use the software package.

Title, ownership rights, and intellectual property rights to the software package and all copies thereof remain with MDACC. The software is copyrighted and is protected by United States copyright laws and international treaty provisions. Except as expressly provided herein, no express or implied right are granted under any patents, copyrights, trademarks or trade secret information.

Except as specifically permitted in writing by MDACC, you shall not have the right to remove of any copyright notice or any proprietary trade or service marks or notices of the owners from the software package or any related documentation.

Derivative works must retain MDACC copyright notices.


GeneClust is copyright by Kim-Anh Do, Rumiana Nikolova, Paul Roebuck,and Bradley Broom.

Other product and company names herein may be trademarks of their respective owners.

Other copyrights given in the files are: java/src/edu/stanford/ejalbert/

Copyright 1999 by Eric Albert (



Trevor Hastie, Robert Tibshirani, Michael Eisen, Patrick Brown, Doug Ross, Uwe Scherf, John Weinstein, Ash Alizadeh, Louis Staudt and David Botstein (2000).
Gene Shaving: a New Class of Clustering Methods for Expression Arrays (HTML) Technical Report, Department of Statistics, Stanford University

Robert Tibshirani, Trevor Hastie, Mike Eisen, Doug Ross, David Botstein and Patrick Brown (1999).
Clustering Methods for the Analysis of DNA Microarray Data (PS) Technical Report, Department of Statistics, Stanford University

Laura Lazzeroni and Art Owen (2000).
Plaid Model for Gene Expression Data (PS, PDF) Technical Report, Department of Statistics, Stanford University