|Description||Somatic point mutation caller for tumor-normal paired samples in next-generation sequencing data.|
|License||GNU GPL Version 2|
|Citations||Fan, Y., Xi, L., Hughes, D. S. T., Zhang, J., Zhang, J., Futreal, P. A., Wheeler, D. A., and Wang, W. Accounting for inter-tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling for sequencing data. Genome Biology. 2016. 17:178. DOI: 10.1186/s13059-016-1029-6.|
|Help and Support|
The detection of somatic point mutations is a key component of cancer genomic research, which has been rapidly developing since next-generation sequencing (NGS) technology revealed its potential for describing genetic alterations in cancer. We present MuSE, a novel approach to mutation calling based on the F81 Markov substitution model for molecular evolution , which models the evolution of the reference allele to the allelic composition of the matched tumor and normal tissue at each genomic locus. To improve overall accuracy, we further adopt a sample-specific error model to identify cutoffs, reflecting the variation in tumor heterogeneity among samples.
Source File: https://github.com/danielfan/MuSE
After downloading the source file, for Unix-like operating systems please type the following commands sequentially in the command line to generate the executable:
unzip MuSEv1.0rc.zip cd MuSEv1.0rc make
For Windows, please install Cygwin (http://www.cygwin.com) first, which provides functionality similar to a Linux distribution on Windows. The following procedures are the same as above.
MuSE is comprised of two steps, which requires
The first step, ‘MuSE call’, takes as input (1) and (2). The BAM files require aligning all the sequence reads against the reference genome using the Burrows-Wheeler alignment tool (BWA), with either the backtrack or the maximal exact matches (MEM) algorithm . In addition, the BAM files need to be processed by following the Genome Analysis Toolkit (GATK) Best Practices [3-5] that include marking duplicates, realigning the paired tumor-normal BAMs jointly and recalibrating base quality scores.
To speed up ‘MuSE call’, we recommend splitting the WGS data into small blocks (<50Mb) by using the provided option either ‘-r’ or ‘-l’, and concatenating all the output files by the Linux command CAT.
The second step, ‘MuSE sump’, takes as input the output file from ‘MuSE call’ and (3). We provide two options for building the sample-specific error model. One is applicable to WES data (option ‘-E’), and the other to WGS data (option ‘-G’).
The following commands briefly illustrate how to use MuSE. As to the preparation of BAM files, please refer to the first part, PRE-PROCESSING, of the Genome Analysis Toolkit (GATK) Best Practices (http://www.broadinstitute.org/gatk/guide/best-practices).
./MuSE call –O Output.Prefix –f Reference.Genome Tumor.bam Matched.Normal.bam ./MuSE sump -I Output.Prefix.MuSE.txt -G –O Output.Prefix.vcf –D dbsnp.vcf.gz
The final output of MuSE is a VCF file that lists the identified somatic variants.