自动汇总
计算机科学
自然语言处理
集合(抽象数据类型)
人工智能
情报检索
多文档摘要
程序设计语言
作者
Xiaojun Liu,Chuang Zhang,Xiaojun Chen,Yanan Cao,Jinpeng Li
标识
DOI:10.1007/978-3-030-60450-9_42
摘要
We present CLTS, a Chinese long text summarization dataset, in order to solve the problem that large-scale and high-quality datasets are scarce in automatic summarization, which is a limitation for further research. To the best of our knowledge, it is the first long text summarization dataset in Chinese. Extracted from the Chinese news website ThePaper.cn (https://www.thepaper.cn/), the corpus contains more than 180,000 Chinese long articles and corresponding summaries written by professional editors and authors, which is available online (CLTS dataset is available to download online at https://github.com/lxj5957/CLTS-Dataset). We train and evaluate several existing methods on CLTS to verify the utility and challenges of the dataset, and the results show that the corpus proposed in this paper is useful to set some baselines to contribute to the further research on automatic text summarization.
科研通智能强力驱动
Strongly Powered by AbleSci AI