A Powerful Bioinformatics Software Tool for Refining Next-Generation Sequencing (NGS) Read Mapping
BM-Map is a powerful NGS genomic loci mapping refiner. It improves the mapping of the multireads (reads mapped to more than one genomic location with similar fidelities), as a refinement step after the general read-alignment is completed.
BM-Map is a powerful NGS genomic loci mapping refiner. It improves the mapping of the Multireads (reads mapped to more than one genomic location with similar fidelities), as a refnement step after the general read-alignment is completed. It is a multi-platform software tool that is built based on the Bayesian mapping of multireads (BM-Map) algorithm that computes a posterior probability of mapping each multiread to a genomic location.
BM-Map is NOT a general mapping tool. Rather, it refines the results
produced by those prevailing mapping tool or aligner (Bowtie, etc).
Therefore, you will need to use the output of an aligner as the input of
BM-Map. See the following figure for the current niche of BM-Map in
the NGS pipeline.
Currently, the industry standard practice is to discard the multireads in
subsequent analyses such as gene expression quantification. This
practice generates a large bias in estimating the expression levels
of duplicated genes. As an initial attempt, Mortazavi et al. (2008)
proposed a proportional alignment method in which unique reads are
first mapped, and then multireads are aligned to equally similar
loci in proportion to the number of corresponding mapped unique
reads. The key idea of the proportional method is that the
individual numbers of unique reads are used to infer the
probabilities of mapping the multireads. While the proportional
method provides a simple and valuable solution to the mapping of the
multireads, it fails to account for the mismatch profiles between
the unique reads and the genomic locations.
Unlike the proportional method which only considers the equally best aligned
genomic locations, BM-Map evaluates genomic locations with unequal numbers of
mismatches to a multiread. More importantly,
BM-Map utilizes three sources of information when mapping the
multireads: the sequencing error profiles, the likelihood of hidden
nucleotide variations, and the expression levels of competing
genomic locations. In contrast, the proportional method only uses
the last source of information. The key idea of the BM-Map is to use
the base-level error rates and the observed mismatch profiles from
unique reads to estimate the error rate due to hidden nucleotide
variations in a hierarchical model. In the end, the BM-Map method assigns
multireads to competing genomic locations based on posterior
probabilities.
Because of the extra information BM-Map incorporates during the
calculation, it handles multireads allocation exceptionally well,
especially for organisms with a relatively large polymophism rate.
Compared with other mapping method, gene expression based on BM-Map shows a better correlation with the
experimental approach:
qRT-PCR
measurement (see the following figure). The results demonstrate the
feasibility of BM-Map and highlight the
importance of accurately allocating multireads when quantifying the expression of young human
duplicates based on next-generation sequencing. This is an essential step for studying the expression and
evolution of young duplicated genes in the human genome.
BM-Map is NOT a mapping tool. Rather, it refines the results
produced by those prevailing mapping tool or aligner (Bowtie, etc).
Therefore, you will need to use the output of an aligner as the input of
BM-Map. See the following figure for the current niche of BM-Map in
the NGS pipeline.
Open terminal, cd to the "BM_Map" directory. Type:
make
This will generate BM_Map executable in the same directory.
The above building process was tested on Ubuntu 10 with gcc 4.6.3, and
MacOs X 10.6 with gcc 4.2.1.
For windows users, the 64-bit Windows binary executable BM-Map.exe is already built and ready for use.
For MacOS 10.6 (Snow Leopard) users, the binary executable BM-Map is already built and ready for use.
For Ubuntu 10.04 users, the binary executable BM-Map is already built and ready for use.
The "InputFile.txt" provides a user-friendly and straightforward way to configure the parameters for BM-Map. Default values were provided, however, most paramters should be customized based on the users' need. Updating the values in "InputFile.txt" appropriately is an important step before running BM-Map.
Starting from 2.0.1, BM-Map also comes with a graphical user interface (GUI), the parameters in which are equivalent and have exactly the same order as thoes in "InputFile.txt".
The BM-Map package comes with an example input file (test.sam) that is ready to run. It is recommended to try the test file first to get a whole picture of the operation of the BM-Map software, such as what kind of output should be expected with different values of parameters and how long the program takes to run. Typically, the execution time is about 2 seconds for the test file with the default value settings.
The output format of BM-Map 2.0.1 is a SAM-like format, named
sam+.
It retains all the required fields and information in the original input SAM
file, and the program appends the newly-calculated
probabilities to the end of each line. Note that if the read is
unmappable or not qualified to pass the standards defined by the program
parameters, 'NA' is given to the probability value(s). The probabilities for the
unique reads, because of the certainty of their mapping, are always ONE.
In addition to the sam+ file, BM-Map also produces a log file (the default name is "BM_Map.log"), which documents the output in the command line windows so that user can check some useful information later.
Webpage created by Yuan Yuan, updated: 02/13/2012
|