维数之咒
欧几里德距离
计算机科学
土方工程距离
最近邻搜索
聚类分析
度量空间
欧几里得空间
k-最近邻算法
内在维度
搜索引擎索引
公制(单位)
规范(哲学)
距离测量
数学
理论计算机科学
算法
数据挖掘
人工智能
离散数学
组合数学
经济
运营管理
政治学
法学
作者
Charų C. Aggarwal,Alexander Hinneburg,Daniel A. Keim
标识
DOI:10.1007/3-540-44503-x_27
摘要
In recent years, the effect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a effciency and/or effectiveness perspective. Recent research results show that in high dimensional space, the concept of proximity, distance or nearest neighbor may not even be qualitatively meaningful. In this paper, we view the dimensionality curse from the point of view of the distance metrics which are used to measure the similarity between objects. We specifically examine the behavior of the commonly used L k norm and show that the problem of meaningfulness in high dimensionality is sensitive to the value of k. For example, this means that the Manhattan distance metric L(1 norm) is consistently more preferable than the Euclidean distance metric L(2 norm) for high dimensional data mining applications. Using the intuition derived from our analysis, we introduce and examine a natural extension of the L k norm to fractional distance metrics. We show that the fractional distance metric provides more meaningful results both from the theoretical and empirical perspective. The results show that fractional distance metrics can significantly improve the effectiveness of standard clustering algorithms such as the k-means algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI