Department of Bioinformatics and Computational Biology


From MD Anderson Bioinformatics
Jump to: navigation, search


Description PathwaysWeb: An Open-Use Integrated API of Pathways, Genes, Directional Gene Interactions, and the Gene Ontology with Data Versioning for Provenance
Development Information
Language Java, XSL, HTML, JavaScript
Current Version 2.0
License Not required
Last Updated 2014-07-22

Help and Support
Contact James M. Melott


A number of databases and application programming interfaces (APIs) exist for retrieving genomic and related data for use in biological research. The existing systems sometimes provide data that is incomplete, use different data identifiers and often do not keep their databases in sync with one another. Those incompatibilities can make the data difficult to use for research. Furthermore, the APIs and data formats may change often and without warning. Many systems have no API and require either the download of an entire database or parsing of web pages to access the data via an automated system.

To help avoid some of the issues listed above, we have developed a resourced-based, well-documented web system, PathwaysWeb. The system provides publicly available information on genes, biological pathways, Gene Ontology terms, gene gene interaction networks with interaction directionality, and links to related PubMed documents. To harmonize the data, it is based on sene symbols approved by the Human Gene Nomenclature Committee (HGNC). The system retrieves data from multiple sources and standardizes elements ensuring that gene symbols in pathways are consistent with HGNC names, and presents the combined data via an integrated web API. Each set of data loaded from the various sources is an archived version of data.


Resources and Their Relationships

The PathwaysWeb system contains multiple resources.

The relationship between those resources can be seen in the diagram below. PathwaysWeb Resource Relationships.gif

Two Services (XML/JSON) or HTML

Users may access the data via one of two services:

Sources of Data

PathwaysWeb combines and standardizes data from the following sources:

  • HGNC Dataset
  • Reactome Curated Pathways and Genes
  • NCI-Nature Curated Pathways and Genes
  • NCI pathways and sub pathways (Scraped from web page)
  • NCBI GeneRIFS (Gene Reference Into Function)
  • Pathway Commons: NCI Nature SIF (Simple Interaction Format)
  • Predictive Networks: CSV
  • Interaction Types (Manually created)
  • NCBI Gene to PubMed
  • The Gene Ontology

Data Processing Steps

The download of the data from the various sources and processing of the data consists of multiple steps.

  • Downloading data from the various sources
  • Extracting the desired subset of the data from those files
  • Combining multiple sources of interactions to a common formats
  • Assigning custom interaction types to interactions resources
  • Synchronizing the data to use HGNC Gene names where possible for pathways and interactions
  • Extending gene to Gene Ontology term mappings beyond those provided by The Gene Ontology
  • Loading the data into the data into the database

A detailed diagram of this process is shown in the following documentation Documentation of data downloading, extraction, and data post processing