概念漂移
数据流挖掘
数据流
计算机科学
统计的
数据挖掘
流式数据
检验统计量
溪流
钥匙(锁)
构造(python库)
试验数据
统计假设检验
人工智能
统计
数学
计算机网络
电信
计算机安全
程序设计语言
作者
Hang Yu,Weixu Liu,Jie Lu,Yonggang Wen,Xiangfeng Luo,Guangquan Zhang
标识
DOI:10.1016/j.patcog.2022.109113
摘要
Concept drift may lead to a sharp downturn in the performance of streaming in data-based algorithms, caused by unforeseeable changes in the underlying distribution of data. In this paper, we are mainly concerned with concept drift across multiple data streams, and in situations where the drift of each data stream cannot be detected in time, due to slight underlying distribution drifts. We call this group concept drift. When compared to the detection of concept drift for a single data stream, the challenges of detecting group concept drift arise from three aspects: first, the training data become more complex; second, the underlying distribution becomes more complex; and third, the correlations between data streams become more complex. To address these challenges, the key idea of our method is to construct a distribution free test statistic, free from any underlying distribution in multiple data streams. Then, for streaming data, we design an online learning algorithm to obtain this test statistic, thereby determining the concept drift caused by the hypothesis test. The experiment evaluations with both synthetic and real-world datasets prove that our method can accurately detect concept drift from multiple data streams.
科研通智能强力驱动
Strongly Powered by AbleSci AI