Apriori算法
SPARK(编程语言)
计算机科学
加权
先验与后验
数据挖掘
数据库事务
关联规则学习
集合(抽象数据类型)
算法
GSP算法
数据库
医学
哲学
认识论
放射科
程序设计语言
标识
DOI:10.1109/icedcs57360.2022.00100
摘要
The traditional Apriori algorithm has the problem of low efficiency. It repeatedly scans the transaction database, generates candidate item sets, and mines massive redundant and worthless rules. This paper first aims at the problem that the Apriori algorithm ignores the importance of different items. Introducing weighting rules reflects the important difference between different items in terms of frequency and weight, secondly, when scanning the input dataset, Generating candidate sets and counting support values at the same time, and it does not use the raw input in every iteration data set, but to calculate the updated input data set with the removal of unnecessary items and transactions, combining the optimized Apriori with the parallel computing processing framework Apache Spark, a Spark-based weighted Apriori algorithm WABS(weighted Apriori algorithm based on Spark) is proposed. Compared with the latest similar algorithms, the experimental results show that the algorithm can effectively shorten the mining time, which is beneficial for discovering more valuable information.
科研通智能强力驱动
Strongly Powered by AbleSci AI