可解释性
人工智能
人工神经网络
限制
管道(软件)
计算机科学
机器学习
无监督学习
计算生物学
生物
工程类
机械工程
程序设计语言
作者
Saloni Mahatma,Lisa Van den Broeck,Nicholas Morffy,Max V. Staller,Lucia C. Strader,Rosangela Sozzani
标识
DOI:10.1109/ciss56502.2023.10089768
摘要
Gene expression is induced by transcription factors (TFs) through their activation domains (ADs). However, ADs are unconserved, intrinsically disordered sequences without a secondary structure, making it challenging to recognize and predict these regions and limiting our ability to identify TFs. Here, we address this challenge by leveraging a neural network approach to systematically predict ADs. As input for our neural network, we used computed properties for amino acid (AA) side chain and secondary structure, rather than relying on the raw sequence. Moreover, to shed light on the features learned by our neural network and greatly increase interpretability, we computed the input properties most important for an accurate prediction. Our findings further highlight the importance of aromatic and negatively charged AA and reveal the importance of unknown AA properties. Taking advantage of these most important features, we used an unsupervised learning approach to classify the ADs into 10 subclasses, which can further be explored for AA specificity and AD functionality. Overall, our pipeline, relying on supervised and unsupervised machine learning, shed light on the non-linear properties of ADs.
科研通智能强力驱动
Strongly Powered by AbleSci AI