已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

VirusBERTHP: Improved Virus Host Prediction Via Attention-based Pre-trained Model Using Viral Genomic Sequences

传染性 计算机科学 寄主(生物学) 人工智能 病毒 基因组 病毒分类 人工神经网络 计算生物学 生物 机器学习 基因 病毒学 遗传学
作者
Yunzhan Wang,Yang Jin,Yunpeng Cai
标识
DOI:10.1109/bibm58861.2023.10385501
摘要

Virus has become the most prominent cause of infectious diseases which greately threaten human health. Determining whether a viral genome can possess human host infectivity would be of great value to epidemic prevention. However, due to the highly diversified and unstructured nature of virus genomes, current bioinformatic and machine learning methods for prediction virus host infectivities are rather limited in performance. In this paper we propose an accurate virus human host infectivity prediction tool, VirusBERTHP, using an attention-based pretraining mechanism following the well-known BERT architecture, which is capable of predicting the human infectivity of a novel virus species whose genome is not in the training database. We develop a BERT-based representation learning scheme, VirusBERT, to efficiently extract the complex feature among versatile virus sequences, which show greate seperability in the feature space. We created a large curated database containing 2,948,656 unlabelled virus sequences to efficiently pre-train the VirusBERT model. Then, the VirusBERTHP model is trained with a relatively smaller set of labelled sequences corresponding to specific tasks, using a full-connected deep neural network. We adopted the model on four published virus-host classification datasets and showed that our model outperforms previous state-of-the-art methods in prediction performance. On three datasets with open-view setting where no restriction is imposed on the taxonomy of the input virus sequences, our model achieved more than 99% accuracy in predicting human host infectivity, justifying the efficiency of our method. In addition to accuracy boost, our model is adaptive to various virus sequence prediction task by seperating the pretraining and supervised learning phases. In addition, the model is adaptable to a wide range of sequence lengths from 250bps to 10k bps, expanding the application field of the model. Source code and data of our paper is available at https://github.com/wyzwyzwyz/virusBert/.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
4秒前
Demi_Ming完成签到,获得积分10
9秒前
Akim应助qiuxuan100采纳,获得10
12秒前
科研小学生完成签到,获得积分10
20秒前
24秒前
33秒前
Lyn应助郑麻采纳,获得50
35秒前
37秒前
zzz发布了新的文献求助30
38秒前
40秒前
oleskarabach发布了新的文献求助10
41秒前
桐桐应助11采纳,获得30
42秒前
我是老大应助星月夜采纳,获得10
45秒前
46秒前
47秒前
程风破浪发布了新的文献求助10
51秒前
55秒前
oleskarabach完成签到,获得积分20
57秒前
科研通AI2S应助加菲丰丰采纳,获得10
58秒前
共享精神应助zzz采纳,获得20
1分钟前
zzz完成签到,获得积分10
1分钟前
gk123kk完成签到,获得积分10
1分钟前
高大厉完成签到 ,获得积分10
1分钟前
程风破浪完成签到,获得积分10
1分钟前
orixero应助XYX采纳,获得10
1分钟前
kkk完成签到 ,获得积分10
1分钟前
罗零完成签到 ,获得积分10
1分钟前
无奈的代珊完成签到 ,获得积分10
1分钟前
1分钟前
1分钟前
乐乐发布了新的文献求助20
1分钟前
呼呼夫人完成签到 ,获得积分10
1分钟前
科研圣体完成签到 ,获得积分10
1分钟前
唐新惠完成签到 ,获得积分10
1分钟前
微笑夜香完成签到,获得积分10
1分钟前
神的孩子在跳舞完成签到 ,获得积分10
1分钟前
1分钟前
Lshyong完成签到 ,获得积分10
1分钟前
科研通AI2S应助科研通管家采纳,获得10
1分钟前
小二郎应助科研通管家采纳,获得10
1分钟前
高分求助中
Licensing Deals in Pharmaceuticals 2019-2024 3000
Cognitive Paradigms in Knowledge Organisation 2000
Introduction to Spectroscopic Ellipsometry of Thin Film Materials Instrumentation, Data Analysis, and Applications 1800
Natural History of Mantodea 螳螂的自然史 1000
A Photographic Guide to Mantis of China 常见螳螂野外识别手册 800
How Maoism Was Made: Reconstructing China, 1949-1965 800
Barge Mooring (Oilfield Seamanship Series Volume 6) 600
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3314323
求助须知:如何正确求助?哪些是违规求助? 2946599
关于积分的说明 8530909
捐赠科研通 2622334
什么是DOI,文献DOI怎么找? 1434459
科研通“疑难数据库(出版商)”最低求助积分说明 665312
邀请新用户注册赠送积分活动 650855