MuSE

Home > Public Software > MuSE

MuSE

hidden row	for table layout
Overview
Description	Somatic point mutation caller for tumor-normal paired samples in next-generation sequencing data.
Development Information
GitHub	wwylab/MuSE
Language	C/C++
Current version	2.0
Platforms	Platform independent
License	GNU GPL Version 2
Status	Active
References
Citation	Fan, Y., Xi, L., Hughes, D. S. T., Zhang, J., Zhang, J., Futreal, P. A., Wheeler, D. A., and Wang, W. Accounting for inter-tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling for sequencing data. Genome Biology. 2016. https://dx.doi.org/10.1186%2Fs13059-016-1029-6
Help and Support
Contact	Wenyi Wang
Discussion	On GitHub

MuSE

The detection of somatic point mutations is a key component of cancer genomic research, which has been rapidly developing since next-generation sequencing (NGS) technology revealed its potential for describing genetic alterations in cancer. We previously launched MuSE 1.0 ¹, a statistical approach for mutation calling based on a Markov substitution model for molecular evolution. It has been used as a major contributing caller in a consensus calling strategy by the TCGA PanCanAtlas project ² and the ICGC Pan-Cancer Analysis of Whole Genomes (PCAWG) initiative ³.

We have now released MuSE 2.0, which is powered by parallel computing by taking advantage of multi-core resource on a machine, and a more efficient way of memory allocation. MuSE 2.0 takes the same input files and outputs the same results as MuSE 1.0. It achieves 50-60x speedup compared to MuSE 1.0 and can complete the running one pair of tumor-normal WES data in 4-5 minutes and one pair of tumor-normal WGS data in 40-50 minutes with 80 cores, thus removing somatic mutation calling as a time-consuming obstacle for cancer genomic studies.

Download

Source File: https://github.com/wwylab/MuSE

Installation

The latest version MuSE 2.0 only supports Linux system with gcc=7.0 and git=2.0 or above. MuSE 1.0 supports both Linux system and MacOS. Please type the following commands sequentially in the terminal to generate the executable file (MuSE) of MuSE 2.0:

git clone https://github.com/wwylab/MuSE.git
cd MuSE
./install_muse.sh

Input Data

Same as MuSE 1.0, MuSE 2.0 requires the following files as input:

(1) the indexed reference genome FASTA file,
(2) the binary sequence alignment/map formatted (BAM) sequence data from the pair of tumor and normal DNA samples, and
(3) the dbSNP variant call format (VCF) file that should be bgzip compressed, tabix indexed and based on the same reference genome as (1).

The first step, MuSE call, takes as input (1) and (2). The BAM files require aligning all the sequence reads against the reference genome using the Burrows-Wheeler alignment tool (BWA), with either the backtrack or the maximal exact matches (MEM) algorithm ². In addition, the BAM files need to be processed by following the Genome Analysis Toolkit (GATK) Best Practices ³^,⁴^,⁵ that include marking duplicates, realigning the paired tumor-normal BAMs jointly and recalibrating base quality scores.

The second step, MuSE sump, takes as input the output file from MuSE call and (3). We provide two options for building the sample-specific error model. One is applicable to WES data (option -E), and the other to WGS data (option -G).

Example Commands

The following commands briefly illustrate how to use MuSE. As to the preparation of BAM files, please refer to the first part, Data Processing Steps, of the GDC DNA-Seq analysis pipeline (https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/ ).

./MuSE call -f Reference.Genome -O Output.Prefix -n 20 Tumor.bam Matched.Normal.bam

For WES,

./MuSE sump -I Output.Prefix.MuSE.txt -O Output.Prefix.vcf -E -n 20 -D dbsnp.vcf.gz

or for WGS,

./MuSE sump -I Output.Prefix.MuSE.txt -O Output.Prefix.vcf -G -n 20 -D dbsnp.vcf.gz

-n represents the number of cores specified. Default is n=1.

Output

The final output of MuSE is a VCF file that lists the identified somatic variants.

Acknowledgement

We thank Mehrzad Samadi and his team from Nvidia Corporation, including Tong Zhu, Timothy Harkins and Ankit Sethia, for their contributions of implementing accelerating techniques in the MuSE call step.

References

Fan, Y. et al. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome biology. 178 (2016). ^[return]
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 25, 1754–1760 (2009). ^[return]
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20, 1297–1303 (2010). ^[return]
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43, 491–498 (2011). ^[return]
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics 11, 11.10.1–11.10.33 (2013). ^[return]

Department of Bioinformatics and Computational Biology

MuSE

Download

Installation

Input Data

Example Commands

Output

Acknowledgement

References