嵌入
计算机科学
等级制度
人工智能
文字嵌入
编码
语义学(计算机科学)
自然语言处理
图形
词(群论)
代表(政治)
模式识别(心理学)
理论计算机科学
数学
法学
程序设计语言
化学
基因
经济
几何学
政治
生物化学
市场经济
政治学
作者
Yinglong Ma,Xiaofeng Liu,Lijiao Zhao,Yue Liang,Peng Zhang,Beihong Jin
标识
DOI:10.1016/j.eswa.2021.115905
摘要
Many real-world text classification tasks often deal with a large number of closely related categories organized in a hierarchical structure or taxonomy. Hierarchical multi-label text classification (HMTC) has become rather challenging when it requires handling large sets of closely related categories. The structural features of all categories in the entire hierarchy and the word semantics of their category labels are very helpful in improving text classification accuracy over large sets of closely related categories, which has been neglected in most of existing HMTC approaches. In this paper, we present a hybrid embedding-based text representation for HMTC with high accuracy. First, the hybrid embedding consists of both graph embedding of categories in the hierarchy and their word embedding of category labels. The Structural Deep Network Embedding-based graph embedding model is used to simultaneously encode the global and local structural features of a given category in the whole hierarchy for making the category structurally discriminable. We further use the word embedding technique to encode the word semantics of each category label in the hierarchy for making different categories semantically discriminable. Second, we presented a level-by-level HMTC approach based on the bidirectional Gated Recurrent Unit network model together with the hybrid embedding that is used to learn the representation of the text level-by-level. Last but not least, extensive experiments were made over five large-scale real-world datasets in comparison with the state-of-the-art hierarchical and flat multi-label text classification approaches, and the experimental results show that our approach is very competitive to the state-of-the-art approaches in classification accuracy, in particular maintaining computational costs while achieving superior performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI