CBA Sketch: A Sketching Algorithm Mining Persistent Batches in Data Streams

素描 计算机科学 数据流挖掘 溪流 数据挖掘 算法 操作系统
作者
Qian Zhou,Yu-E Sun,He Huang,Yifan Han
出处
期刊:Lecture Notes in Computer Science 卷期号:: 114-132
标识
DOI:10.1007/978-981-97-0811-6_7
摘要

Batch is a vital data pattern commonly observed in data streams, representing a group of identical items that occur closely together. However, existing works primarily focus on the periodicity mining of batches, neglecting other numerous essential patterns. In this paper, we introduce the concept of persistent batch, a particular pattern in data streams where multiple occurrences of the same batch happen in at least k out of t measurement periods. Mining persistent batches holds significance in applications such as APT detection, DDoS detection, and Click Fraud detection, etc. To fill up the gap of the prior art, we propose CBA Sketch, a memory-efficient sketching algorithm that effectively mines persistent batches from data streams. The CBA Sketch utilizes a Circular-Time Sketch (CT-Sketch) to accurately calculate item intervals and capture batches with limited memory resources. We incorporate the carefully designed Bloom Filter-based Existence Recorder (BE Recorder) and Approximate Size Recorder (AS Recorder) to preserve batch information. Additionally, we introduce a novel metric called dual-mean size to provide measurements for persistent batch sizes. Extensive experiments demonstrate that our CBA Sketch outperforms the strawman solution about $$62 \times $$ in terms of average relative error and $$2 \times $$ in terms of throughput.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Delia完成签到,获得积分10
刚刚
1秒前
yuyuyu完成签到,获得积分10
3秒前
粥粥发布了新的文献求助20
3秒前
慕青应助zyd采纳,获得10
3秒前
4秒前
ZQP发布了新的文献求助10
5秒前
5秒前
thth发布了新的文献求助10
5秒前
6秒前
科研通AI5应助穿山甲采纳,获得30
7秒前
酷波er应助ZQP采纳,获得10
8秒前
jia发布了新的文献求助10
9秒前
zzx发布了新的文献求助10
10秒前
10秒前
11秒前
13秒前
chloe完成签到,获得积分10
13秒前
victor28完成签到,获得积分10
13秒前
王漂泊关注了科研通微信公众号
14秒前
maox1aoxin应助小文大魔王采纳,获得30
14秒前
14秒前
15秒前
天天快乐应助zzx采纳,获得10
16秒前
16秒前
小郭发布了新的文献求助10
20秒前
jia完成签到,获得积分10
21秒前
zzx完成签到,获得积分10
21秒前
21秒前
FengXY发布了新的文献求助10
22秒前
隐形曼青应助victor28采纳,获得10
22秒前
zyd发布了新的文献求助10
23秒前
cjj发布了新的文献求助10
26秒前
26秒前
27秒前
zyd完成签到,获得积分10
28秒前
29秒前
30秒前
31秒前
34秒前
高分求助中
IZELTABART TAPATANSINE 500
Where and how to use plate heat exchangers 400
Seven new species of the Palaearctic Lauxaniidae and Asteiidae (Diptera) 400
Handbook of Laboratory Animal Science 300
Fundamentals of Medical Device Regulations, Fifth Edition(e-book) 300
Beginners Guide To Clinical Medicine (Pb 2020): A Systematic Guide To Clinical Medicine, Two-Vol Set 250
A method for calculating the flow in a centrifugal impeller when entropy gradients are present 240
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3708104
求助须知:如何正确求助?哪些是违规求助? 3256592
关于积分的说明 9901149
捐赠科研通 2969105
什么是DOI,文献DOI怎么找? 1628367
邀请新用户注册赠送积分活动 772115
科研通“疑难数据库(出版商)”最低求助积分说明 743639