计算机科学
人工智能
分类器(UML)
变压器
模式识别(心理学)
语音识别
电气工程
电压
工程类
标识
DOI:10.1007/978-981-99-8850-1_43
摘要
The pre-trained model based on the Transformer architecture is currently the most widely used model in the field of Natural Language Processing (NLP), and feature fusion technology is the process of aggregating features from different sources to form an augmented feature representation that contains more information. In multi-modal or multi-branch NLP models, feature fusion is a commonly used technique, but for models with only a single feature source, feature fusion technology can be difficult to apply. Therefore, this paper proposes a new probabilistic-controlled late fusion encoder-decoder architecture, called the Feature Fusion Gate (FFG), based on both feature fusion technology and Mixup technology to aggregate the feature representations from the last two layers of the NLP pre-trained model to better capture semantic information in samples. During the aggregation process, FFG utilizes controlled noise as a regularization technique to help the model achieve better generalization performance. Experimental results on eight NLP benchmark datasets show that FFG outperforms three other baseline methods and consistently achieves significant performance improvements across DistilBERT, BERT and RoBERTa.
科研通智能强力驱动
Strongly Powered by AbleSci AI