Transformer-based models for chemical SMILES representation: A comprehensive literature review

化学信息学 计算机科学 人工智能 变压器 自然语言处理 机器学习 数据科学 信息学 认知科学 化学 工程类 心理学 电气工程 计算化学 电压
作者
Medard Edmund Mswahili,Young-Seob Jeong
出处
期刊:Heliyon [Elsevier]
卷期号:10 (20): e39038-e39038 被引量:1
标识
DOI:10.1016/j.heliyon.2024.e39038
摘要

Pre-trained chemical language models (CLMs) have attracted increasing attention within the domains of cheminformatics and bioinformatics, inspired by their remarkable success in the natural language processing (NLP) domain such as speech recognition, text analysis, translation, and other objectives associated with language. Furthermore, the vast amount of unlabeled data associated with chemical compounds or molecules has emerged as a crucial research focus, prompting the need for CLMs with reasoning capabilities over such data. Molecular graphs and molecular descriptors are the predominant approaches to representing molecules for property prediction in machine learning (ML). However, Transformer-based LMs have recently emerged as de-facto powerful tools in deep learning (DL), showcasing outstanding performance across various NLP downstream tasks, particularly in text analysis. Within the realm of pre-trained transformer-based LMs such as, BERT (and its variants) and GPT (and its variants) have been extensively explored in the chemical informatics domain. Various learning tasks in cheminformatics such as the text analysis that necessitate handling of chemical SMILES data which contains intricate relations among elements or atoms, have become increasingly prevalent. Whether the objective is predicting molecular reactions or molecular property prediction, there is a growing demand for LMs capable of learning molecular contextual information within SMILES sequences or strings from text inputs (i.e., SMILES). This review provides an overview of the current state-of-the-art of chemical language Transformer-based LMs in chemical informatics for de novo design, and analyses current limitations, challenges, and advantages. Finally, a perspective on future opportunities is provided in this evolving field.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
我要发sci完成签到,获得积分20
刚刚
迅速白梦发布了新的文献求助10
2秒前
领导范儿应助BKP采纳,获得10
4秒前
别来无恙发布了新的文献求助10
5秒前
5秒前
爆米花应助研友_LOoomL采纳,获得10
12秒前
13秒前
别来无恙完成签到,获得积分20
13秒前
13秒前
不加糖给不加糖的求助进行了留言
13秒前
调皮雨灵发布了新的文献求助10
13秒前
布娃娃小熊完成签到,获得积分10
15秒前
隐形曼青应助扣我头上采纳,获得10
16秒前
无花果应助完美的海秋采纳,获得10
16秒前
明亮沛珊应助枫林听雨采纳,获得40
17秒前
科研通AI2S应助高贵以南采纳,获得10
17秒前
19秒前
YC发布了新的文献求助10
19秒前
20秒前
21秒前
年轻寒蕾完成签到,获得积分10
23秒前
潇洒哥发布了新的文献求助10
25秒前
25秒前
26秒前
扣我头上发布了新的文献求助10
28秒前
28秒前
dxc完成签到 ,获得积分10
28秒前
枫林听雨给枫林听雨的求助进行了留言
29秒前
jawa完成签到 ,获得积分10
30秒前
NZH发布了新的文献求助10
30秒前
科目三应助2032jia采纳,获得10
31秒前
31秒前
32秒前
蔚蓝完成签到 ,获得积分10
34秒前
辛勤莫茗完成签到,获得积分10
34秒前
复杂的千柳完成签到,获得积分10
34秒前
LIM发布了新的文献求助10
35秒前
36秒前
YC完成签到,获得积分20
36秒前
37秒前
高分求助中
歯科矯正学 第7版(或第5版) 1004
The late Devonian Standard Conodont Zonation 1000
Nickel superalloy market size, share, growth, trends, and forecast 2023-2030 1000
Smart but Scattered: The Revolutionary Executive Skills Approach to Helping Kids Reach Their Potential (第二版) 1000
PraxisRatgeber: Mantiden: Faszinierende Lauerjäger 700
A new species of Coccus (Homoptera: Coccoidea) from Malawi 500
Zeitschrift für Orient-Archäologie 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3238441
求助须知:如何正确求助?哪些是违规求助? 2883823
关于积分的说明 8231778
捐赠科研通 2551777
什么是DOI,文献DOI怎么找? 1380294
科研通“疑难数据库(出版商)”最低求助积分说明 649001
邀请新用户注册赠送积分活动 624631