维数(图论)
还原(数学)
路径(计算)
降维
数学
数学优化
应用数学
算法
计算机科学
组合数学
人工智能
几何学
程序设计语言
作者
Benoît Liquet,Sarat Moka,Samuel Müller
出处
期刊:Cornell University - arXiv
日期:2024-03-29
标识
DOI:10.48550/arxiv.2403.20007
摘要
The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper, we focus on two multivariate statistical methods: principal components analysis and partial least squares. Both approaches are popular linear dimension-reduction methods with numerous applications in several fields including in genomics, biology, environmental science, and engineering. In particular, these approaches build principal components, new variables that are combinations of all the original variables. A main drawback of principal components is the difficulty to interpret them when the number of variables is large. To define principal components from the most relevant variables, we propose to cast the best subset solution path method into principal component analysis and partial least square frameworks. We offer a new alternative by exploiting a continuous optimization algorithm for best subset solution path. Empirical studies show the efficacy of our approach for providing the best subset solution path. The usage of our algorithm is further exposed through the analysis of two real datasets. The first dataset is analyzed using the principle component analysis while the analysis of the second dataset is based on partial least square framework.
科研通智能强力驱动
Strongly Powered by AbleSci AI