过度拟合
特征(语言学)
人工智能
计算机科学
水准点(测量)
杠杆(统计)
生物信息学
机器学习
人工神经网络
化学
大地测量学
语言学
生物化学
基因
哲学
地理
作者
Peiran Jiang,Wanshan Ning,Yunshu Shi,Chuan Liu,Saijun Mo,Haoran Zhou,Kangdong Liu,Yaping Guo
标识
DOI:10.1016/j.csbj.2021.08.013
摘要
As a novel lactate-derived post-translational modification (PTM), lysine lactylation (Kla) is involved in diverse biological processes, and participates in human tumorigenesis. Identification of Kla substrates with their exact sites is crucial for revealing the molecular mechanisms of lactylation. In contrast with labor-intensive and time-consuming experimental approaches, computational prediction of Kla could provide convenience and increased speed, but is still lacking. In this work, although current identified Kla sites are limited, we constructed the first Kla benchmark dataset and developed a few-shot learning-based architecture approach to leverage the power of small datasets and reduce the impact of imbalance and overfitting. A maximum 11.7% (0.745 versus 0.667) increase of area under the curve (AUC) value was achieved in contrast to conventional machine learning methods. We conducted a comprehensive survey of the performance by combining 8 sequence-based features and 3 structure-based features and tailored a multi-feature hybrid system for synergistic combination. This system achieved >16.2% improvement of the AUC value (0.889 versus 0.765) compared with single feature-based models for the prediction of Kla sites in silico. Taken few-shot learning and hybrid system together, we present our newly designed predictor named FSL-Kla, which is not only a cutting-edge tool for Kla site profile but also could generate candidates for further experimental approaches. The webserver of FSL-Kla is freely accessible for academic research at http://kla.zbiolab.cn/.
科研通智能强力驱动
Strongly Powered by AbleSci AI