可扩展性
计算机科学
事件(粒子物理)
等级制度
序列(生物学)
背景(考古学)
经验法则
数据挖掘
关联规则学习
SPARK(编程语言)
比例(比率)
人工智能
理论计算机科学
机器学习
算法
数据库
程序设计语言
生物
物理
古生物学
经济
量子力学
遗传学
市场经济
作者
Xiang Ao,Haoran Shi,Jin Wang,Luo Zuo,Hongwei Li,Qing He
出处
期刊:ACM Transactions on Intelligent Systems and Technology
[Association for Computing Machinery]
日期:2019-07-20
卷期号:10 (4): 1-26
被引量:40
摘要
Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from a single long event sequence, is one of the essential building blocks for the sequence mining research field. Existing studies about FEM suffer from unsatisfied scalability when faced with complex sequences as it is an NP-complete problem for testing whether an episode occurs in a sequence. In this article, we propose a scalable, distributed framework to support FEM on “big” event sequences. As a rule of thumb, “big” illustrates an event sequence is either very long or with masses of simultaneous events. Meanwhile, the events in this article are arranged in a predefined hierarchy. It derives some abstractive events that can form episodes that may not directly appear in the input sequence. Specifically, we devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes. We then present an efficient special-purpose algorithm to improve the local mining performance. We also extend our framework to support maximal and closed episode mining in the context of event hierarchy, and to the best of our knowledge, we are the first attempt to define and discover hierarchy-aware maximal and closed episodes. We implement the proposed framework on Apache Spark and conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the efficiency and scalability of the proposed approach and show that we can find practical patterns when taking event hierarchies into account.
科研通智能强力驱动
Strongly Powered by AbleSci AI