计算机科学
数据集成
正规化(语言学)
通量平衡分析
上传
数据挖掘
数据类型
机器学习
人工智能
生物信息学
生物
程序设计语言
操作系统
作者
Giuseppe Magazzù,Guido Zampieri,Claudio Angione
标识
DOI:10.1093/bioinformatics/btab324
摘要
Abstract Motivation High-throughput biological data, thanks to technological advances, have become cheaper to collect, leading to the availability of vast amounts of omic data of different types. In parallel, the in silico reconstruction and modeling of metabolic systems is now acknowledged as a key tool to complement experimental data on a large scale. The integration of these model- and data-driven information is therefore emerging as a new challenge in systems biology, with no clear guidance on how to better take advantage of the inherent multisource and multiomic nature of these data types while preserving mechanistic interpretation. Results Here, we investigate different regularization techniques for high-dimensional data derived from the integration of gene expression profiles with metabolic flux data, extracted from strain-specific metabolic models, to improve cellular growth rate predictions. To this end, we propose ad-hoc extensions of previous regularization frameworks including group, view-specific and principal component regularization and experimentally compare them using data from 1143 Saccharomyces cerevisiae strains. We observe a divergence between methods in terms of regression accuracy and integration effectiveness based on the type of regularization employed. In multiomic regression tasks, when learning from experimental and model-generated omic data, our results demonstrate the competitiveness and ease of interpretation of multimodal regularized linear models compared to data-hungry methods based on neural networks. Availability and implementation All data, models and code produced in this work are available on GitHub at https://github.com/Angione-Lab/HybridGroupIPFLasso_pc2Lasso. Supplementary information Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI