启发式
整数规划
计算机科学
构造(python库)
整数(计算机科学)
线性规划
序列(生物学)
单倍型
算法
数学优化
理论计算机科学
人工智能
数学
程序设计语言
生物化学
化学
生物
基因型
基因
遗传学
作者
Zhi‐Zhong Chen,Fēi Dèng,Chao Shen,Yiji Wang,Lusheng Wang
标识
DOI:10.1089/cmb.2015.0035
摘要
Haplotype assembly is to directly construct the haplotypes of an individual from sequence fragments (reads) of the individual. Although a number of programs have been designed for computing optimal or heuristic solutions to the haplotype assembly problem, computing an optimal solution may take days or even months while computing a heuristic solution usually requires a trade-off between speed and accuracy. This article refines a previously known integer linear programming-based (ILP-based) approach to the haplotype assembly problem in twofolds. First, the read-matrices of some datasets (such as NA12878) come with a quality for each base in the reads. We here propose to utilize the qualities in the ILP-based approach. Secondly, we propose to use the ILP-based approach to improve the output of any heuristic program for the problem. Experiments with both real and simulated datasets show that the qualities of read-matrices help us find more accurate solutions without significant loss of speed. Moreover, our experimental results show that the proposed hybrid approach improves the output of ReFHap (the current leading heuristic) significantly (say, by almost 25% of the QAN50 score) without significant loss of speed, and can even find optimal solutions in much shorter time than the original ILP-based approach. Our program is available upon request to the authors.
科研通智能强力驱动
Strongly Powered by AbleSci AI