Department of Bioinformatics and Computational Biology

Home > Public Software > MuSE

MuSE

hidden rowfor table layout
Overview
DescriptionSomatic point mutation caller for tumor-normal paired samples in next-generation sequencing data.
Development Information
GitHub wwylab/MuSE
LanguageC/C++
Current version2.0
PlatformsPlatform independent
LicenseGNU GPL Version 2
StatusActive
References
Citation Fan, Y., Xi, L., Hughes, D. S. T., Zhang, J., Zhang, J., Futreal, P. A., Wheeler, D. A., and Wang, W. Accounting for inter-tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling for sequencing data. Genome Biology. 2016. https://dx.doi.org/10.1186%2Fs13059-016-1029-6 
Help and Support
Contact Wenyi Wang 
Discussion On GitHub 

MuSE

The detection of somatic point mutations is a key component of cancer genomic research, which has been rapidly developing since next-generation sequencing (NGS) technology revealed its potential for describing genetic alterations in cancer. We previously launched MuSE 1.0 1, a statistical approach for mutation calling based on a Markov substitution model for molecular evolution. It has been used as a major contributing caller in a consensus calling strategy by the TCGA PanCanAtlas project 2 and the ICGC Pan-Cancer Analysis of Whole Genomes (PCAWG) initiative 3.

We have now released MuSE 2.0, which is powered by parallel computing by taking advantage of multi-core resource on a machine, and a more efficient way of memory allocation. MuSE 2.0 takes the same input files and outputs the same results as MuSE 1.0. It achieves 50-60x speedup compared to MuSE 1.0 and can complete the running one pair of tumor-normal WES data in 4-5 minutes and one pair of tumor-normal WGS data in 40-50 minutes with 80 cores, thus removing somatic mutation calling as a time-consuming obstacle for cancer genomic studies.

Download

Source File: https://github.com/wwylab/MuSE

Installation

The latest version MuSE 2.0 only supports Linux system with gcc=7.0 and git=2.0 or above. MuSE 1.0 supports both Linux system and MacOS. Please type the following commands sequentially in the terminal to generate the executable file (MuSE) of MuSE 2.0:

git clone https://github.com/wwylab/MuSE.git
cd MuSE
./install_muse.sh

Input Data

Same as MuSE 1.0, MuSE 2.0 requires the following files as input:

The first step, MuSE call, takes as input (1) and (2). The BAM files require aligning all the sequence reads against the reference genome using the Burrows-Wheeler alignment tool (BWA), with either the backtrack or the maximal exact matches (MEM) algorithm 2. In addition, the BAM files need to be processed by following the Genome Analysis Toolkit (GATK) Best Practices 3,4,5 that include marking duplicates, realigning the paired tumor-normal BAMs jointly and recalibrating base quality scores.

The second step, MuSE sump, takes as input the output file from MuSE call and (3). We provide two options for building the sample-specific error model. One is applicable to WES data (option -E), and the other to WGS data (option -G).

Example Commands

The following commands briefly illustrate how to use MuSE. As to the preparation of BAM files, please refer to the first part, Data Processing Steps, of the GDC DNA-Seq analysis pipeline (https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/ ).

./MuSE call -f Reference.Genome -O Output.Prefix -n 20 Tumor.bam Matched.Normal.bam

For WES,

./MuSE sump -I Output.Prefix.MuSE.txt -O Output.Prefix.vcf -E -n 20 -D dbsnp.vcf.gz

or for WGS,

./MuSE sump -I Output.Prefix.MuSE.txt -O Output.Prefix.vcf -G -n 20 -D dbsnp.vcf.gz

-n represents the number of cores specified. Default is n=1.

Output

The final output of MuSE is a VCF file that lists the identified somatic variants.

Acknowledgement

We thank Mehrzad Samadi and his team from Nvidia Corporation, including Tong Zhu, Timothy Harkins and Ankit Sethia, for their contributions of implementing accelerating techniques in the MuSE call step.

References


  1. Fan, Y. et al. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome biology. 178 (2016). [return]
  2. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 25, 1754–1760 (2009). [return]
  3. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20, 1297–1303 (2010). [return]
  4. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43, 491–498 (2011). [return]
  5. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics 11, 11.10.1–11.10.33 (2013). [return]