特征选择
特征(语言学)
计算机科学
人工智能
逻辑回归
一致性(知识库)
机器学习
回归
因果关系(物理学)
数据挖掘
算法
模式识别(心理学)
数学
统计
哲学
语言学
物理
量子力学
作者
Haotian Ni,Shilin Gu,Ruidong Fan,Chenping Hou
标识
DOI:10.1016/j.patcog.2023.110033
摘要
With the emerging of new data collection ways, the features are incremental and accumulated gradually. Due to the expansion of feature spaces, it is more common that there are unknown biases between the distribution of training and testing datasets. It is known as the unknown data selection bias, which belongs to the learning scenario with non-i.i.d samples. The performance of traditional approaches, which need the i.i.d. assumption, will be aggravated seriously. How to design an algorithm to address the problem of data selection bias in this feature incremental scenario is crucial but rarely studied. In this paper, we propose a feature incremental classification algorithm with causality. Firstly, we embed the confounding variable balance algorithm in causal learning into the prediction modeling and utilize the logical regression algorithm with balancing regular terms as a baseline. Then, to satisfy the special requirement of feature increment, we design a new regularizer, which maintains the consistency of the regression coefficients between the data in the current and previous stages. It retains the correlation between the old features and labels. Finally, we propose the Multiple Balancing Logistic Regression model (MBRLR) to jointly optimize the balancing regularizer and weighted logistic regression model with multiple feature sets. We also present theoretical results to show that our proposed algorithm can make precise and stable predictions. Besides, the numerical results also demonstrate that our MBRLR algorithm is superior to other methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI