计算机科学
集成学习
机器学习
溪流
数据流挖掘
人工智能
在线学习
半监督学习
监督学习
主动学习(机器学习)
数据挖掘
人工神经网络
万维网
计算机网络
作者
Yinan Guo,Jiayang Pu,Botao Jiao,Yanyan Peng,Dini Wang,Shengxiang Yang
标识
DOI:10.1016/j.asoc.2024.111452
摘要
Concept drift is a core challenge in classification tasks of data streams. Although many drift adaptation methods have been presented, most of them assume that labels of all data are available, which is impractical in many real-world applications. Additionally, the absence of label makes the imbalance ratio of an imbalanced data stream difficultly being obtained in time, providing the inaccurate guidance for resampling and causing poor generalization. To tackle the joint challenges, an online semi-supervised active learning method is proposed to classifier imbalanced data streams with concept drift. A newly-arrived data is first added to the sliding window, and then assigned a pseudo label in terms of its nearest cluster. Meanwhile, semi-supervised clustering algorithm offers its predicted label. Based on the above two predictive labels, cluster-based query strategy provides the criteria for the evaluation and selection of representative instances. More especially, the uncertainty and importance of instances are defined to synthetically evaluate its representativeness. After obtaining true labels of typical ones, ensemble classifier is updated by all instances in current sliding window. Experimental results on 13 synthetic and real data streams indicate that the proposed method outperforms six comparative methods on both G-mean and Recall under various labeling budgets.
科研通智能强力驱动
Strongly Powered by AbleSci AI