概念漂移
分类器(UML)
计算机科学
数据流
块(置换群论)
人工智能
模式识别(心理学)
标记数据
数据流挖掘
共形映射
数据挖掘
数学
电信
数学分析
几何学
作者
Songde Ma,Wei Kang,Yun Xue,Yonggang Wen
出处
期刊:Communications in computer and information science
日期:2023-11-26
卷期号:: 355-366
标识
DOI:10.1007/978-981-99-8184-7_27
摘要
In this article, we consider the problem of semi-supervised data stream classification. The main difficulties of data stream semi-supervised classification include how to jointly utilize labeled and unlabeled samples to adress concept drift detection and how to use unlabeled to update trained classifier. Existing algorithms like the CPSSDS method constantly retrain a new classifier when concept drift is detected, it is very consuming and wasteful. In this paper, the algorithm of data stream semi-supervised classification with recurring concept drift named as CPSSDS-R is proposed. First, the labeled samples in the first data block are used to initialize a classifier, which is added into a pool and actived for classification. While a new data block arrives, concept drift is detected by computing conformal prediction results. If no concept drift is detected, the pseudo-labeled samples in the previous data block are added with the labeled samples in the current data block to incrementally train the active classifier. If a new concept is detected, a new classifier is trained on the labeled samples of the current data block and added into the pool and actived for classification, else if a recurring concept is detected, the pseudo-labeled samples and labeled samples in the current data block are used to incrementally update the classifier corresponding to the recurring concept in the pool and actived for classification. The proposed algorithm is tested on multiple synthetic and real datasets, and its cumulative accuracy and block accuracy at different labeling ratios demonstrate the effectiveness of the proposed algorithm. The code for the proposed algorithm is available on https://gitee.com/ymw12345/cpssds-r .
科研通智能强力驱动
Strongly Powered by AbleSci AI