FamSeq
Overview | |
Description | FamSeq is a computational tool for calculating probability of variants in family-based sequencing data |
Development Information | |
GitHub | wwylab/FamSeq |
Language | C++ |
Current version | V1.0.2 |
Platforms | Platform independent |
License | GPL v3 |
Status | Active |
Last updated | 07/01/2014 |
News | Version 1.0.2 includes option for GPU-based computing. |
References | |
Citation | Peng G., Fan Y., Palculict T.B., Shen P., Ruteshouser E.C., Chi A., Davis R.W., Huff V., Scharfe C., Wang W. Rare variant detection using family-based sequencing analysis. PNAS 110, p3985 (2013). https://doi.org/10.1073/pnas.1222158110 |
Help and Support | |
Contact | Wenyi Wang |
It is still challenging to call rare variants. In family-based sequencing studies, information from all family members should be utilized to more accurately identify new germline mutations. FamSeq serves this purpose by providing the probability of an individual carrying a variant given his/her entire family’s raw measurements. FamSeq accommodates de novo mutations and can perform variant calling at chrX.
FamSeq takes both likelihood and the widely used vcf files as input.
Updated in v1.0.3: Allows for VCF output from FreeBayes.
1. Extract files from the compressed file
tar xvf FamSeq1.0.2.tar.gz
For windows user, please check here to extract the file.
2. Build FamSeq
CPU version:
cd FamSeq/src/
make
GPU version:
cd FamSeq/src/
make -f makefile.gpu
If you use MacOS 10.8 with XCode 5, or an error like “unsupported option ‘-dumpspecs’” occurs when compiling the GPU version, use the following command to compile
make -f makefile.gpu.clang
3. Run the test data
CPU version:
With VCF file as input:
./FamSeq vcf -vcfFile ../TestData/test.vcf -pedFile ../TestData/fam01.ped -output test.FamSeq.vcf -v
With likelihood only format file as input:
./FamSeq LK -lkFile ../TestData/loftest.txt -pedFile ../TestData/fam01.ped -output test.FamSeq.txt
GPU version:
With VCF file as input:
./FamSeqCuda vcf -vcfFile ../TestData/test.vcf -pedFile ../TestData/fam01.ped -output test.FamSeq.vcf -v
With likelihood only format file as input:
./FamSeqCuda LK -lkFile ../TestData/loftest.txt -pedFile ../TestData/fam01.ped -output test.FamSeq.txt
Synopsis
FamSeq vcf -vcfFile input.vcf -pedFile input.ped -output output.vcf
FamSeq LK -lkFile lk.txt -pedFile input.ped -output output.txt
Commands and Options
First specify the command according to the input file type. If the input file is a VCF file, the command is vcf. If it is a likelihood only format file, the command is LK.
vcf
FamSeq vcf [-method 1] [-mRate 1e-7] [-v] [-a] [-l] [-vcfFile ] [-pedFile ] [-output] [-LRC] [-genoProbN] [-genoProbK] [-genoProbXN] [-genoProbXK] [numBurnIn] [numRep]
Options:
The name of ped file that store pedigree information. The pedigree should be a full family, which means that everyone in the family has two parents except for the founders of the family. There are five columns in the ped file. The first column is individual id that should be larger than 0. The second and third column is mother’s id and father’s id. If the individual is the founder of the family, set the mother and father’s id to 0. The forth column is gender. 1: male and 2: female. It will cause some errors at X chromosomes if the gender is not set correctly. The last column is individual name in the vcf/likelihood only format file. If there is no information of an individual in vcf/likelihood only format file, set the individual name to NA in the ped file. As the pedigree shown on the left. There are 6 individuals in this family. All individuals other than the grandfather were sequenced. Then the vcf file or the likelihood only format file look like the following:
Then we construct the corresponding ped file. Make sure the individual name in the ped file is the same as in the vcf file. The grandfather should be included in ped file with individual name NA, even though there is no information about him in the vcf/likelihood only format file. The file is shown on below:LK
FamSeq LK [-method 1] [-mRate 1e-7] [-lkType n] [-v] [-a] [-l] [-lkFile ] [-pedFile ] [-output] [-LRC] [-genoProbN] [-genoProbK] [-genoProbXN] [-genoProbXK]
Options:
Output
FamSeq creates a new file by adding three columns to the original input file as the output file: GPP, FPP and FGT. GPP is the posterior probability calculated by single individual based method and FPP is the posterior probability calculated by FamSeq. These probabilities are all Phred-scaled. FGT is the genotype called by FamSeq.