清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

Machine Learning Methods for Small Data Challenges in Molecular Science

人工智能 机器学习 计算机科学 深度学习 大数据 人工神经网络 小数据 自编码 支持向量机 深信不疑网络 数据挖掘
作者
Bozheng Dou,Zailiang Zhu,Ekaterina Merkurjev,Ke Lü,Long Chen,Jiang Jian,Yueying Zhu,Jie Liu,Bengong Zhang,Guo‐Wei Wei
出处
期刊:Chemical Reviews [American Chemical Society]
卷期号:123 (13): 8736-8780 被引量:106
标识
DOI:10.1021/acs.chemrev.3c00189
摘要

Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
28秒前
30秒前
yuehan完成签到 ,获得积分10
30秒前
郜南烟发布了新的文献求助10
36秒前
CY发布了新的文献求助10
40秒前
彭于晏应助郜南烟采纳,获得10
41秒前
lovexa完成签到,获得积分10
1分钟前
zyjsunye完成签到 ,获得积分0
1分钟前
digger2023完成签到 ,获得积分10
1分钟前
迪西完成签到 ,获得积分10
1分钟前
无悔完成签到 ,获得积分10
1分钟前
jesusmanu完成签到,获得积分0
2分钟前
SciGPT应助郜南烟采纳,获得10
2分钟前
2分钟前
郜南烟发布了新的文献求助10
2分钟前
creep2020完成签到,获得积分10
2分钟前
领导范儿应助科研通管家采纳,获得10
2分钟前
firewood完成签到 ,获得积分10
2分钟前
善学以致用应助郜南烟采纳,获得10
3分钟前
3分钟前
郜南烟发布了新的文献求助10
3分钟前
追寻奇迹完成签到 ,获得积分10
3分钟前
小强完成签到 ,获得积分10
3分钟前
梅啦啦完成签到 ,获得积分10
4分钟前
minuxSCI完成签到,获得积分10
4分钟前
zhangguo完成签到 ,获得积分10
5分钟前
受伤的薯片完成签到 ,获得积分10
5分钟前
5分钟前
lamborghini193完成签到,获得积分10
5分钟前
6分钟前
郜南烟发布了新的文献求助10
6分钟前
华仔应助郜南烟采纳,获得10
6分钟前
莎莎完成签到 ,获得积分10
7分钟前
scenery0510完成签到,获得积分10
7分钟前
yi完成签到 ,获得积分10
7分钟前
8分钟前
zxt完成签到,获得积分10
8分钟前
郜南烟发布了新的文献求助10
8分钟前
ww完成签到,获得积分10
8分钟前
飞龙在天完成签到,获得积分10
8分钟前
高分求助中
Evolution 10000
Sustainability in Tides Chemistry 2800
The Young builders of New china : the visit of the delegation of the WFDY to the Chinese People's Republic 1000
юрские динозавры восточного забайкалья 800
English Wealden Fossils 700
叶剑英与华南分局档案史料 500
Foreign Policy of the French Second Empire: A Bibliography 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3146783
求助须知:如何正确求助?哪些是违规求助? 2798063
关于积分的说明 7826678
捐赠科研通 2454607
什么是DOI,文献DOI怎么找? 1306394
科研通“疑难数据库(出版商)”最低求助积分说明 627723
版权声明 601527