自动汇总
计算机科学
越南语
任务(项目管理)
水准点(测量)
情报检索
多文档摘要
人工智能
自然语言处理
过程(计算)
聚类分析
语言学
操作系统
哲学
经济
管理
地理
大地测量学
作者
Quoc-An Nguyen,Duy-Cat Can,Quynh Hoang Le,Mai-Vu Tran
出处
期刊:International Journal of Asian Language Processing
[World Scientific]
日期:2022-09-01
卷期号:32 (02n03)
被引量:1
标识
DOI:10.1142/s2717554523500030
摘要
The performance of automatic summarization systems has improved significantly with the development of supervised approaches. However, in the Vietnamese abstractive multi-document summarization task, the available datasets are insufficient for training the model. With this motivation, we contribute a new gold standard Vietnamese abstractive multi-document summarization dataset, named Abmusu. Following the collecting and clustering of articles, we have built a hierarchical annotation process to generate summaries, with three roles: annotator, supervisor, and curator. As a result, the dataset contains 600 news clusters formed from 1839 articles and the corresponding human-generated summaries. To the best of our knowledge, Abmusu dataset is the biggest dataset for Vietnamese abstractive multi-document summarization that is freely available for research. Moreover, summaries are more concise, making it challenging to train the summarization models. We also used various summarization baselines to benchmark the Abmusu dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI