安全性令牌
计算机科学
嵌入
判决
自动汇总
人工智能
自然语言处理
联营
稳健性(进化)
语音识别
生物化学
化学
计算机安全
基因
作者
Yuho Cha,Younghoon Lee
出处
期刊:Neurocomputing
[Elsevier BV]
日期:2024-01-01
卷期号:564: 126987-126987
标识
DOI:10.1016/j.neucom.2023.126987
摘要
Although pretrained language models achieve high performance on various natural language processing tasks, they still require further improvements in the sentence embedding task. Many studies have improved performance in this task using pre-trained language models and contrastive learning, but these approaches are limited because they are based on naive average pooling and CLS tokens. Therefore, we propose an advanced sentence-embedding method based on weighted pooling that considers token importance. Specifically, the token importance is calculated by combining an explainable artificial-intelligence module with a text summarization model, and the final sentence embedding is derived through weighted pooling token embedding and token importance. Thus, we derive a sentence embedding that considers both the local information of the token embedding and the global information of the entire sentence. Experimental results reveal that our proposed sentence embedding outperforms other models on both text similarity tasks and text classification. Moreover, the proposed method’s robustness is verified through the results of an ablation study.
科研通智能强力驱动
Strongly Powered by AbleSci AI