范畴变量
相似性(几何)
聚类分析
度量(数据仓库)
相似性度量
数据挖掘
计算机科学
欧几里德距离
代表(政治)
特征(语言学)
模式识别(心理学)
人工智能
数学
机器学习
语言学
哲学
政治
政治学
法学
图像(数学)
作者
Yanqing Ye,Bin Lin,Penafei Yang,Weilong Yang,Wen Zhang,Xiaomin Zhu
标识
DOI:10.1109/bigdia56350.2022.9874058
摘要
Mixed data is a typical heterogeneous structured feature of the complex system elements, which contains both categorical and continuous attributes. To effectively analyze the similarity of mixed data, taking into consideration the heterogeneous coupling relationship between the mixed attributes, this work proposes a heterogeneous coupling relationship-based similarity measure for mixed data (HMS). First, through automatic discretization based on K-means, continuous attributes are converted into categorical attributes and the categorical data view is extracted, for which the HGS method is used to measure the similarity. Besides, through similarity representation, the categorical attributes are converted into continuous attributes, then the continuous data view is constructed and its similarity measure is performed by Euclidean distance. Furthermore, the harmonic mean of the similarity between both views is calculated to obtain the integrated similarity. Finally, the effectiveness and feasibility of the HMS method are verified in clustering experiments. Compared with other common-used mixed data similarity measures, the HMS method can more fully capture the categorical and continuous views of mixed-type attributes of the complex system elements, as well as the complex heterogeneous relationships existing between views and inside the view, which as a result greatly improves the clustering performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI