运输机
模式生物
有机体
计算生物学
生物
跨膜蛋白
生物化学
人工智能
计算机科学
遗传学
基因
受体
作者
Andreas Denger,Volkhard Helms
标识
DOI:10.1021/acs.jcim.2c00850
摘要
α-Helical transmembrane proteins termed membrane transporters mediate the passage of small hydrophilic substrate molecules across biological lipid bilayer membranes. Annotating the specific substrates of the dozens to hundreds of individual transporters of an organism is an important task. In the past, machine learning classifiers have been successfully trained on pan-organism data sets to predict putative substrates of transporters. Here, we critically examine the selection of an optimal data set of protein sequence features for the classification task. We focus on membrane transporters of the three model organisms Escherichia coli, Arabidopsis thaliana, and Saccharomyces cerevisiae, as well as human. We show that organism-specific classifiers can be robustly trained if at least 20 samples are available for each substrate class. If information from position-specific scoring matrices is included, such classifiers have F1 scores between 0.85 and 1.00. For the largest data set (A. thaliana), a 4-class classifier yielded an F-score of 0.97. On a pan-organism data set composed of transporters of all four organisms, amino acid and sugar transporters were predicted with an F1 score of 0.91.
科研通智能强力驱动
Strongly Powered by AbleSci AI