拷贝数变化
计算机科学
分离(微生物学)
异常检测
变化(天文学)
数据挖掘
基因组
生物
生物信息学
遗传学
基因
天体物理学
物理
作者
Xiguo Yuan,Jiaao Yu,Jianing Xi,Liying Yang,Junliang Shang,Zhe Li,Junbo Duan
标识
DOI:10.1109/tcbb.2019.2920889
摘要
Accurate detection of copy number variations (CNVs) from short-read sequencing data is challenging due to the uneven distribution of reads and the unbalanced amplitudes of gains and losses. The direct use of read depths to measure CNVs tends to limit performance. Thus, robust computational approaches equipped with appropriate statistics are required to detect CNV regions and boundaries. This study proposes a new method called CNV_IFTV to address this need. CNV_IFTV assigns an anomaly score to each genome bin through a collection of isolation trees. The trees are trained based on isolation forest algorithm through conducting subsampling from measured read depths. With the anomaly scores, CNV_IFTV uses a total variation model to smooth adjacent bins, leading to a denoised score profile. Finally, a statistical model is established to test the denoised scores for calling CNVs. CNV_IFTV is tested on both simulated and real data in comparison to several peer methods. The results indicate that the proposed method outperforms the peer methods. CNV_IFTV is a reliable tool for detecting CNVs from short-read sequencing data even for low-level coverage and tumor purity. The detection results on tumor samples can aid to evaluate known cancer genes and to predict target drugs for disease diagnosis.
科研通智能强力驱动
Strongly Powered by AbleSci AI