计算机科学
可扩展性
修剪
架空(工程)
计算
算法
数据挖掘
执行时间
调度(生产过程)
并行计算
数据库
农学
运营管理
生物
操作系统
经济
作者
Sumalatha Saleti,R. B. V. Subramanyam
标识
DOI:10.1007/s10489-018-1259-2
摘要
Sequential Pattern Mining (SPM) problem is much studied and extended in several directions. With the tremendous growth in the size of datasets, traditional algorithms are not scalable. In order to solve the scalability issue, recently few researchers have developed distributed algorithms based on MapReduce. However, the existing MapReduce algorithms require multiple rounds of MapReduce, which increases communication and scheduling overhead. Also, they do not address the issue of handling long sequences. They generate huge number of candidate sequences that do not appear in the input database and increases the search space. This results in more number of candidate sequences for support counting. Our algorithm is a two phase MapReduce algorithm that generates the promising candidate sequences using the pruning strategies. It also reduces the search space and thus the support computation is effective. We make use of the item co-occurrence information and the proposed Sequence Index List (SIL) data structure helps in computing the support at fast. The experimental results show that the proposed algorithm has better performance over the existing MapReduce algorithms for the SPM problem.
科研通智能强力驱动
Strongly Powered by AbleSci AI