CliP
Overview | |
Description | Clonal structure identification through penalizing pairwise differences |
Development Information | |
GitHub | wwylab/CliP |
Language | python (>3.5.1) |
Current version | 0.1 |
License | Click for license |
Status | Active |
Last updated | 2018/05/01 |
References | |
Citation | Kaixian Yu, Seng Jung Shin, Hongtu Zhu, Wenyi Wang, and on behalf of the PCAWG Evolution and Heterogeneity Working Group and the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Network. CliP: fast subclonal architecture reconstruction from whole-genome sequencing data. (2018) |
Help and Support | |
Contact | Wenyi Wang |
CliP is a subclonal identification tool designed for next-generation sequecing of bulk tumor samples. It is one of the 11 participating methods in the Pan-Cancer Analysis of Whole Genome (PCAWG) working group, Heterogeneity and Evolution, of International Cancer Genome Consortium (ICGC). And the method is described in the manuscript (See Citation).
Need R(>3.3.1) and python(>3.5.1), the script does not support python2.
There is no need to install CliP.
Input data format: CliP need 3 input files (please see Sample data for a more visualized example of input data):
SNV file: a tab separated file containing 4 columns, the first column denotes the chromosome, the second one is the position of the SNV, the third column records alt read, and the last column denotes the ref read.
CNV file: a tab separated file containing 5 or 4 columns, the first column denotes the chromosome, the second one is the start position of the CNV segment, the third column records the end position of the CNV segment. For the actual copy number of each segment, CliP accepts both total-only or allele specific copy number.
Purity file: A file containing the purity either an estimation from CNA or a guess that will be corrected hopefully by CliP. For details on purity issue, please check our manuscript.
CSR was originally created for Pan-Cancer Analysis of Whole Genome (PCAWG) working group, Heterogeneity and Evolution, of International Cancer Genome Consortium (ICGC) during the Heterogeneity project. It was used to make a consensus subclonal architecture out of results of 11 participating methods. Please see (Heterogeneity citation) for details.
Part of the calculation depends on SPAMS http://spams-devel.gforge.inria.fr/downloads.html . Please follow the instructions on their website to install spams python3 version (prefer Anaconda distribution).
The preprocess script needs R package “dummies” to work. You can simply type the following in R to install the package.
install.packages('dummies')
There is no need to install CSR, it runs just like your regular python script.