|Description||MetaBatch Web - Standardized Data for Metabolomics Workbench|
|Language||D3 (JS), Java|
|News||Standardized Data access to Metabolomics Workbench Data|
|Help and Support|
The MetaBatch Standardized Data for Metabolomics Workbench app accepts input files from the world’s largest metabolomics data repository, Metabolomics Workbench (NIH) (MW). MW currently contains 464 MSI-compliant datasets representing numerous MS and NMR instrument platforms. The figure below shows the proposed architecture of the MetaBatch resource and how it will connect to public or private metabolomic databases.
Below is a link to sample Metabolomics Workbench Data processed by the in-development MetaBatch Web Analysis tool and as early proof-of-concept for the pipeline workflow described below.
MetaBatch architecture overview with three main components: (i) the database adapter that connects to public or private metabolomic databases; (ii) the computation API that performs assessment, detection and correction of batch effects; (iii) the visualization API for visualizing results using a repertoire of different dynamic plots.
See also the MBatch R package.
The Pipeline consists of Java code running within Docker Linux containers that use the Metabolomics Workbench HTTP/RESTful API. The Pipeline checks for updated or new datasets and downloads a list of samples and files associated with the changes.
As in the approach we employed for TCGA, we will automate MetaBatch processing to enable the analysis of quantitative metabolomic data generated from multiple sources, including MS and NMR. We will also work with the curators of Metabolomics Workbench to develop a direct interface that allows MetaBatch to be run automatically after a data set is deposited in the Workbench repository. For other sites (e.g., core facilities), MetaBatch will be configurable for automatic or on-request operation as needed. The figure below shows a typical workflow diagram for the most common use cases.
We will introduce MetaBatch in the form of a computational resource (tool) and also a web portal. The system will allow users with no programing experience to analyze their own data in an interactive graphical environment that enables the application of novel, quantitative measures of data quality and batch effects. Data entry and selection of analysis parameters will be achieved through an intuitive, aesthetically appealing interface. The visualization component will display dynamic, interactive plots with flexible user control. Corrected and uncorrected data will be available for download or for upload to the cloud. We will develop a user-friendly, “ready-to-run” version of the MetaBatch system that non-informatics users can easily install, configure, and use without IT services. It will be “run-anywhere” to avoid the need to support multiple operating systems and user environments. It will be based on Linux container technology using Docker. We were early bioinformatic adopters of Docker and have implemented it successfully for other projects (e.g., MBatch and NG-CHMs). For users who prefer to download and install MetaBatch on their own machines, we will create an install wizard to enable Macintosh and Windows users to run local instances of the MetaBatch Docker image without expert assistance. In accord with the RFA and Aim 5, we will obtain feedback on the usability of and workflow compatibility of MetaBatch by serial testing as well as by monitoring user forums.
The MBatch R package is availalbe on GitHub as a direct install to R or as a pre-built DockerHub Container and provided Dockerfile and Docker-Compose files. https://github.com/MD-Anderson-Bioinformatics/BatchEffectsPackage
We will provide MetaBatch as an R-package that programming-savvy users can download to analyze their own data. The R-package will allow users to customize analyses of their own datasets, but its use will require R-language programming expertise well beyond what most biologists and clinical researchers have at their disposal. The data will be processed by the computational back-end of MetaBatch (using R), and results of the analysis will be sent back to the visualization front-end. The user will then be able to download the corrected data and/or save the results of analysis on his/her own machine (or on the cloud). The MetaBatch system will consist of many components that would be complex and tedious to install on end-user machines because of the many libraries and other versioned components required. Therefore, we propose to develop a user-friendly, “ready-to-run” version that non-informatics users can easily install, configure, and use without IT services. It will be “run-anywhere” to avoid the need to support multiple operating systems and user environments. It will be based on Linux container technology using Docker. We were early bioinformatic adopters of Docker and have implemented it successfully for other projects (e.g., NG-CHMs). For users who prefer to download and install MetaBatch on their own machines, we will create an install wizard to enable Macintosh and Windows users to run local instances of the MetaBatch Docker image without expert assistance. Our generic goals:
We will provide technical documentation for bioinformaticians and others whose primary interest is assessing and correcting batch effects. The technical documentation will describe the system architecture and object formats for users who want to develop inter-operable components and/or contribute to the system’s open-source development.