计算机科学
同义词(分类学)
越南语
语义相似性
自然语言处理
任务(项目管理)
人工智能
相似性(几何)
利用
意义(存在)
情报检索
机器学习
语言学
图像(数学)
哲学
生物
属
经济
植物
管理
计算机安全
心理治疗师
心理学
作者
Nguyen Ngoc Tram Anh,Thanh-Binh Tran
标识
DOI:10.1109/rivf55975.2022.10013827
摘要
Recently, applying Data Augmentation in low resource languages has emerged as an obvious remedy for assisting deep learning models in achieving the state-of-the-art performance. Yet, there have not been many studies conducted for Vietnamese - considered as a low resource language. This paper proposes two new text augmentation methods for detecting semantic similarity between a pair of questions, named Synonym Replacement and Definition Replacement. The two methods exploit the shared keywords between questions to expand the dataset while guaranteeing the meaning preservation of the original sentences. Experimental results show that both approaches improve the performance of BERT and PhoBERT models. Significantly, the BERT with datasets enriched by Synonym Replacement gains 0.91 in accuracy and F1-score for Similarity Detection task. It implies that the offered methods can be adapted to optimize the efficacy of other natural language processing tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI