Lasso(编程语言)
特征选择
选择(遗传算法)
计算机科学
数据挖掘
算法
人工智能
万维网
作者
Sijia Yang,Shunjie Chen,Pei Wang,Aimin Chen,Tianhai Tian
出处
期刊:IEEE Journal of Biomedical and Health Informatics
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:28 (1): 526-537
被引量:3
标识
DOI:10.1109/jbhi.2023.3326485
摘要
Feature selection has been extensively applied to identify cancer genes using omics data. Although substantial studies have been conducted to search for cancer genes, the available rich knowledge on various cancers is seldom used as prior information in feature selection. This paper proposes a two-stage prior LASSO (TSPLASSO) method, which represents an early attempt in designing feature selection algorithms using prior information. The first stage performs gene selection via linear regression with LASSO. Candidate genes that are correlated with known cancer genes are retained for subsequent analysis. The second stage establishes a logistic regression model with LASSO to realize final cancer gene selection and sample classification. The key advantages of TSPLASSO include the successive consideration of prior cancer genes and binary sample types as response variables in stages one and two, respectively. In addition, the TSPLASSO performs sample classification and variable selection simultaneously. Compared with six state-of-the-art algorithms, numerical simulations in six real-world datasets show that TSPLASSO can improve the accuracy of variable selection by 5%–400% in the three bulk sequencing datasets and the scRNA-seq dataset; and the performance is robust against data noise and variations of prior cancer genes. The TSPLASSO provides an efficient, stable and practical algorithm for exploring biomedcial and health informatics from omics data.
科研通智能强力驱动
Strongly Powered by AbleSci AI