计算机科学
人工智能
排名(信息检索)
聚类分析
瓶颈
深度学习
模式识别(心理学)
数据挖掘
嵌入
情报检索
机器学习
嵌入式系统
作者
Jie Xiong,Yu Li,Xi Niu,Youfang Leng
标识
DOI:10.1016/j.ins.2022.11.158
摘要
Extreme Multi-label Text Classification (XMTC) is a key task of finding the most relevant labels from a large label set for a document. Although some deep learning-based methods have shown great success in XMTC, they still suffer from the following drawbacks. First, although several methods have improved the precision by clustering labels and combining several sub-models to train and predict for one dataset, they were not ideal in terms of computational efficiency. Second, most of those methods need a low dimensional bottleneck layer before the output layer to compress the feature representations to fit the GPU memory, which results in information loss of original features. In this paper, we proposed a novel two-stage XMTC framework with candidate Retrieving and deep Ranking (XRR) to address those drawbacks. In the retrieving stage, we designed two retrieval strategies, including an aligning Point Mutual Information (aPMI) method, and a Unified Label-Semantic Embedding (ULSE) method, to extract hundreds of candidates from massive labels. In the ranking stage, we presented a deep ranking model using a pre-trained transformer to distinguish the true labels from candidates. Extensive experiments show that XRR outperforms the state-of-the-art methods on five widely used multi-label datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI