计算生物学
基因组
编码(社会科学)
计算机科学
疾病
人类基因组
生物
基因
遗传学
医学
统计
数学
病理
作者
Xiaoyong Pan,Lars Juhl Jensen,Jan Gorodkin
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2018-10-04
卷期号:35 (9): 1494-1502
被引量:22
标识
DOI:10.1093/bioinformatics/bty859
摘要
Abstract Motivation Long non-coding RNAs (lncRNAs) are important regulators in wide variety of biological processes, which are linked to many diseases. Compared to protein-coding genes (PCGs), the association between diseases and lncRNAs is still not well studied. Thus, inferring disease-associated lncRNAs on a genome-wide scale has become imperative. Results In this study, we propose a machine learning-based method, DislncRF, which infers disease-associated lncRNAs on a genome-wide scale based on tissue expression profiles. DislncRF uses random forest models trained on expression profiles of known disease-associated PCGs across human tissues to extract general patterns between expression profiles and diseases. These models are then applied to score associations between lncRNAs and diseases. DislncRF was benchmarked against a gold standard dataset and compared to other methods. The results show that DislncRF yields promising performance and outperforms the existing methods. The utility of DislncRF is further substantiated on two diseases in which we find that top scoring candidates are supported by literature or independent datasets. Availability and implementation https://github.com/xypan1232/DislncRF Supplementary information Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI