瓶颈
计算机科学
特征(语言学)
生成语法
人工智能
蛋白质功能预测
任务(项目管理)
机器学习
功能(生物学)
数据挖掘
样品(材料)
模式识别(心理学)
质量(理念)
对抗制
蛋白质功能
工程类
基因
生物
嵌入式系统
哲学
化学
认识论
系统工程
进化生物学
生物化学
色谱法
语言学
作者
Cen Wan,David T. Jones
标识
DOI:10.1038/s42256-020-0222-1
摘要
Protein function prediction is a challenging but important task in bioinformatics. Many prediction methods have been developed, but are still limited by the bottleneck on training sample quantity. Therefore, it is valuable to develop a data augmentation method that can generate high-quality synthetic samples to further improve the accuracy of prediction methods. In this work, we propose a novel generative adversarial networks-based method, FFPred-GAN, to accurately learn the high-dimensional distributions of protein sequence-based biophysical features and also generate high-quality synthetic protein feature samples. The experimental results suggest that the synthetic protein feature samples are successful in improving the prediction accuracy for all three domains of Gene Ontology through augmentation of the original training protein feature samples. Training machine learning models to predict the function of proteins is limited by the availability of only a small amount of labelled training data. Training can be improved by employing generative adversarial networks to generate additional synthetic protein samples.
科研通智能强力驱动
Strongly Powered by AbleSci AI