范畴变量
集合(抽象数据类型)
计算机科学
数据挖掘
数据类型
数据集
变量
人工智能
机器学习
数学
程序设计语言
作者
Michel van de Velden,Alfonso Iodice D’Enza,Angelos Markos,Carlo Cavicchia
标识
DOI:10.1016/j.patcog.2024.110547
摘要
The degree to which objects differ from each other with respect to observations on a set of variables, plays an important role in many statistical methods. Many data analysis methods require a quantification of differences in the observed values which we can call distances. An appropriate definition of a distance depends on the nature of the data and the problem at hand. For distances between numerical variables, there exist many definitions that depend on the size of the observed differences. For categorical data, the definition of a distance is more complex as there is no straightforward quantification of the size of the observed differences. In this paper, we introduce a flexible framework for efficiently computing distances between categorical variables, supporting existing and new formulations tailored to specific contexts. In supervised classification, it enhances performance by integrating relationships between response and predictor variables. This framework allows measuring differences among objects across diverse data types and domains.
科研通智能强力驱动
Strongly Powered by AbleSci AI