自动汇总
计算机科学
水准点(测量)
出版
基线(sea)
流利
情报检索
质量(理念)
人工智能
判别式
自然语言处理
地质学
地理
业务
哲学
广告
认识论
海洋学
语言学
大地测量学
作者
Jiaan Wang,Zhixu Li,Qiang Yang,Jianfeng Qu,Zhigang Chen,Qingsheng Liu,Guoping Hu
标识
DOI:10.1145/3459637.3482188
摘要
Sports game summarization aims to generate news articles from live text commentaries. A recent state-of-the-art work, SportsSum, not only constructs a large benchmark dataset, but also proposes a two-step framework. Despite its great contributions, the work has three main drawbacks: 1) the noise existed in SportsSum dataset degrades the summarization performance; 2) the neglect of lexical overlap between news and commentaries results in low-quality pseudo-labeling algorithm; 3) the usage of directly concatenating rewritten sentences to form news limits its practicability. In this paper, we publish a new benchmark dataset SportsSum2.0, together with a modified summarization framework. In particular, to obtain a clean dataset, we employ crowd workers to manually clean the original dataset. Moreover, the degree of lexical overlap is incorporated into the generation of pseudo labels. Further, we introduce a reranker-enhanced summarizer to take into account the fluency and expressiveness of the summarized news. Extensive experiments show that our model outperforms the state-of-the-art baseline.
科研通智能强力驱动
Strongly Powered by AbleSci AI