MuSE
Overview | |
Description | Somatic point mutation caller for tumor-normal paired samples in next-generation sequencing data. |
Development Information | |
GitHub | wwylab/MuSE |
Language | C/C++ |
Current version | 2.0 |
Platforms | Platform independent |
License | GNU GPL Version 2 |
Status | Active |
References | |
Citation | Fan, Y., Xi, L., Hughes, D. S. T., Zhang, J., Zhang, J., Futreal, P. A., Wheeler, D. A., and Wang, W. Accounting for inter-tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling for sequencing data. Genome Biology. 2016. https://dx.doi.org/10.1186%2Fs13059-016-1029-6 |
Help and Support | |
Contact | Wenyi Wang |
Discussion | On GitHub |
The detection of somatic point mutations is a key component of cancer genomic research, which has been rapidly developing since next-generation sequencing (NGS) technology revealed its potential for describing genetic alterations in cancer. We previously launched MuSE 1.0 1, a statistical approach for mutation calling based on a Markov substitution model for molecular evolution. It has been used as a major contributing caller in a consensus calling strategy by the TCGA PanCanAtlas project 2 and the ICGC Pan-Cancer Analysis of Whole Genomes (PCAWG) initiative 3.
We have now released MuSE 2.0, which is powered by parallel computing by taking advantage of multi-core resource on a machine, and a more efficient way of memory allocation. MuSE 2.0 takes the same input files and outputs the same results as MuSE 1.0. It achieves 50-60x speedup compared to MuSE 1.0 and can complete the running one pair of tumor-normal WES data in 4-5 minutes and one pair of tumor-normal WGS data in 40-50 minutes with 80 cores, thus removing somatic mutation calling as a time-consuming obstacle for cancer genomic studies.
Source File: https://github.com/wwylab/MuSE
The latest version MuSE 2.0 only supports Linux system with gcc=7.0
and git=2.0
or above. MuSE 1.0 supports both Linux system and MacOS.
Please type the following commands sequentially in the terminal to generate the executable file (MuSE
) of MuSE 2.0:
git clone https://github.com/wwylab/MuSE.git
cd MuSE
./install_muse.sh
Same as MuSE 1.0, MuSE 2.0 requires the following files as input:
The first step, MuSE call
, takes as input (1) and (2). The BAM files require aligning all the sequence reads against the reference genome using the Burrows-Wheeler alignment tool (BWA), with either the backtrack or the maximal exact matches (MEM) algorithm 2. In addition, the BAM files need to be processed by following the Genome Analysis Toolkit (GATK) Best Practices 3,4,5 that include marking duplicates, realigning the paired tumor-normal BAMs jointly and recalibrating base quality scores.
The second step, MuSE sump
, takes as input the output file from MuSE call
and (3). We provide two options for building the sample-specific error model. One is applicable to WES data (option -E
), and the other to WGS data (option -G
).
The following commands briefly illustrate how to use MuSE. As to the preparation of BAM files, please refer to the first part, Data Processing Steps, of the GDC DNA-Seq analysis pipeline (https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/ ).
./MuSE call -f Reference.Genome -O Output.Prefix -n 20 Tumor.bam Matched.Normal.bam
For WES,
./MuSE sump -I Output.Prefix.MuSE.txt -O Output.Prefix.vcf -E -n 20 -D dbsnp.vcf.gz
or for WGS,
./MuSE sump -I Output.Prefix.MuSE.txt -O Output.Prefix.vcf -G -n 20 -D dbsnp.vcf.gz
-n
represents the number of cores specified. Default is n=1
.
The final output of MuSE is a VCF file that lists the identified somatic variants.
We thank Mehrzad Samadi and his team from Nvidia Corporation, including Tong Zhu, Timothy Harkins and Ankit Sethia, for their contributions of implementing accelerating techniques in the MuSE call
step.