拷贝数变化
计算机科学
离群值
鉴定(生物学)
核(代数)
核密度估计
异常检测
数据挖掘
人工智能
生物
基因组
遗传学
统计
数学
基因
组合数学
植物
估计员
作者
A. K. Alvi Haque,Kun Xie,Kang Liu,Haiyong Zhao,Xiaohui Yang,Xiguo Yuan
标识
DOI:10.1016/j.dsp.2022.103524
摘要
Copy number variation (CNV) is a prevalent type of genetic structural variation and is the origin of numerous hereditary diseases. Thorough identification and classification of CNVs are fundamental to provide a whole perspective of human genome and to discover diseased genes. Next generation sequencing (NGS) has provided an abundance of data which has accelerated the revolution of algorithm design to identify CNVs at base-pair resolution. Nonetheless, certain functions are often influenced by several factors which include sequencing artifacts, GC bias, and interrelations among neighboring positions within CNVs. Though a number of peer strategies have coped with a few of the aforementioned artifacts by modeling their approaches, precise identification of CNVs of low amplitudes remains a difficult task. In this paper, we propose an alternative computational method CNV-KOF, to accurately detect CNVs of whole-range amplitudes based on NGS data. The approach adopts an adaptive kernel density estimation (KDE)-based strategy and assigns a KDE-based outlier factor (KOF) to each genomic segment. Along with the outlier factor profile, CNV-KOF adopts a box plot strategy to detect CNVs without depending on distribution assumptions. We have tested CNV-KOF on simulated and real datasets compared to several peer methods. Simulation and real sequencing data experiments demonstrate that the proposed method outperforms the peer methods in respect to F1-score, sensitivity, and precision. Thus, CNV-KOF is expected to become a complementary tool for detecting CNVs even in scenarios of low-level coverage and tumor purity.
科研通智能强力驱动
Strongly Powered by AbleSci AI