DNA甲基化
计算机科学
最大化
人工智能
机器学习
DNA
甲基化
计算生物学
基因
生物
遗传学
基因表达
数学
数学优化
作者
Yingying Yu,Wenjia He,Junru Jin,Guobao Xiao,Lizhen Cui,Rao Zeng,Leyi Wei
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2021-09-30
卷期号:37 (24): 4603-4610
被引量:42
标识
DOI:10.1093/bioinformatics/btab677
摘要
Abstract Motivation DNA methylation plays an important role in epigenetic modification, the occurrence, and the development of diseases. Therefore, identification of DNA methylation sites is critical for better understanding and revealing their functional mechanisms. To date, several machine learning and deep learning methods have been developed for the prediction of different DNA methylation types. However, they still highly rely on manual features, which can largely limit the high-latent information extraction. Moreover, most of them are designed for one specific DNA methylation type, and therefore cannot predict multiple methylation sites in multiple species simultaneously. In this study, we propose iDNA-ABT, an advanced deep learning model that utilizes adaptive embedding based on Bidirectional Encoder Representations from Transformers (BERT) together with transductive information maximization (TIM). Results Benchmark results show that our proposed iDNA-ABT can automatically and adaptively learn the distinguishing features of biological sequences from multiple species, and thus perform significantly better than the state-of-the-art methods in predicting three different DNA methylation types. In addition, TIM loss is proven to be effective in dichotomous tasks via the comparison experiment. Furthermore, we verify that our features have strong adaptability and robustness to different species through comparison of adaptive embedding and six handcrafted feature encodings. Importantly, our model shows great generalization ability in different species, demonstrating that our model can adaptively capture the cross-species differences and improve the predictive performance. For the convenient use of our method, we further established an online webserver as the implementation of the proposed iDNA-ABT. Availability and implementation Our proposed iDNA-ABT and data are freely accessible via http://server.wei-group.net/iDNA_ABT and our source codes are available for downloading in the GitHub repository (https://github.com/YUYING07/iDNA_ABT). Supplementary information Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI