纤维
序列(生物学)
随机森林
淀粉样纤维
特征(语言学)
骨料(复合)
化学
生物物理学
计算生物学
生物系统
材料科学
计算机科学
人工智能
生物化学
淀粉样β
纳米技术
病理
医学
生物
语言学
哲学
疾病
作者
Shaofeng Liao,Yujun Zhang,Xinchen Han,Tinglan Wang,Xi Wang,Qinglin Yan,Qian Li,Yifei Qi,Zhuqing Zhang
摘要
Abstract Liquid–liquid phase separation (LLPS) and the solid aggregate (also referred to as amyloid aggregates) formation of proteins, have gained significant attention in recent years due to their associations with various physiological and pathological processes in living organisms. The systematic investigation of the differences and connections between proteins undergoing LLPS and those forming amyloid fibrils at the sequence level has not yet been explored. In this research, we aim to address this gap by comparing the two types of proteins across 36 features using collected data available currently. The statistical comparison results indicate that, 24 of the selected 36 features exhibit significant difference between the two protein groups. A LLPS‐Fibrils binary classification model built on these 24 features using random forest reveals that the fraction of intrinsically disordered residues (F IDR ) is identified as the most crucial feature. While, in the further three‐class LLPS‐Fibrils‐Background classification model built on the same screened features, the composition of cysteine and that of leucine show more significant contributions than others. Through feature ablation analysis, we finally constructed a model FLFB (Feature‐based LLPS‐Fibrils‐Background protein predictor) using six refined features, with an average area under the receiver operating characteristics of 0.83. This work indicates using sequence features and a machine learning model, proteins undergoing LLPS or forming amyloid fibrils can be identified.
科研通智能强力驱动
Strongly Powered by AbleSci AI