树遍历
修剪
完备性(序理论)
树(集合论)
集合(抽象数据类型)
对比度(视觉)
数学
特里亚
群(周期表)
数据挖掘
计算机科学
人工智能
算法
数据结构
组合数学
数学分析
有机化学
化学
程序设计语言
生物
农学
作者
Hongyan Liu,Yiren Yang,Zhuohua Chen,Yong Zheng
出处
期刊:Informs Journal on Computing
日期:2014-05-01
卷期号:26 (2): 208-221
被引量:8
标识
DOI:10.1287/ijoc.2013.0558
摘要
Understanding differences between groups in a data set is one of the fundamental tasks in data analysis. As relevant applications accumulate, data-mining methods have been developed to specifically address the problem of group difference detection. Contrast set mining discovers group differences in the form of conjunction of feature-value pairs or items. In this paper, we incorporate absolute difference, relative difference, and statistical significance in our definition of a group difference, and develop a novel method named DIFF that uses the prefix-tree structure to compress the search space, follows a tree traversal procedure to discover the complete set of significant group differences, and employs efficient pruning strategies to expedite the search process. We conducted comprehensive experiments to compare our method with existing methods on completeness of results, pruning efficiency, and computational efficiency. The experiments demonstrate that our method guarantees completeness of results and achieves higher pruning efficiency and computational efficiency compared to STUCCO. In addition, our definition of group difference is more general than STUCCO. Our method is more effective than traditional approaches, such as classification trees, in discovering the complete set of significant group differences.
科研通智能强力驱动
Strongly Powered by AbleSci AI