Xiang Ao,Haoran Shi,Jin Wang,Luo Zuo,Hongwei Li,Qing He
出处
期刊:ACM Transactions on Intelligent Systems and Technology [Association for Computing Machinery] 日期:2019-07-20卷期号:10 (4): 1-26被引量:40
标识
DOI:10.1145/3326163
摘要
Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from a single long event sequence, is one of the essential building blocks for the sequence mining research field. Existing studies about FEM suffer from unsatisfied scalability when faced with complex sequences as it is an NP-complete problem for testing whether an episode occurs in a sequence. In this article, we propose a scalable, distributed framework to support FEM on “big” event sequences. As a rule of thumb, “big” illustrates an event sequence is either very long or with masses of simultaneous events. Meanwhile, the events in this article are arranged in a predefined hierarchy. It derives some abstractive events that can form episodes that may not directly appear in the input sequence. Specifically, we devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes. We then present an efficient special-purpose algorithm to improve the local mining performance. We also extend our framework to support maximal and closed episode mining in the context of event hierarchy, and to the best of our knowledge, we are the first attempt to define and discover hierarchy-aware maximal and closed episodes. We implement the proposed framework on Apache Spark and conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the efficiency and scalability of the proposed approach and show that we can find practical patterns when taking event hierarchies into account.