计算机科学
聚类分析
语义学(计算机科学)
无监督学习
人工智能
水准点(测量)
机器学习
监督学习
数据挖掘
钥匙(锁)
人工神经网络
计算机安全
大地测量学
程序设计语言
地理
作者
Hanlei Zhang,Hui Xu,Xin Wang,Fei Long,Kai Gao
出处
期刊:Cornell University - arXiv
日期:2023-01-01
标识
DOI:10.48550/arxiv.2304.07699
摘要
New intent discovery is of great value to natural language processing, allowing for a better understanding of user needs and providing friendly services. However, most existing methods struggle to capture the complicated semantics of discrete text representations when limited or no prior knowledge of labeled data is available. To tackle this problem, we propose a novel clustering framework, USNID, for unsupervised and semi-supervised new intent discovery, which has three key technologies. First, it fully utilizes unsupervised or semi-supervised data to mine shallow semantic similarity relations and provide well-initialized representations for clustering. Second, it designs a centroid-guided clustering mechanism to address the issue of cluster allocation inconsistency and provide high-quality self-supervised targets for representation learning. Third, it captures high-level semantics in unsupervised or semi-supervised data to discover fine-grained intent-wise clusters by optimizing both cluster-level and instance-level objectives. We also propose an effective method for estimating the cluster number in open-world scenarios without knowing the number of new intents beforehand. USNID performs exceptionally well on several benchmark intent datasets, achieving new state-of-the-art results in unsupervised and semi-supervised new intent discovery and demonstrating robust performance with different cluster numbers.
科研通智能强力驱动
Strongly Powered by AbleSci AI