自动汇总
计算机科学
判别式
杠杆(统计)
概率逻辑
人工智能
多文档摘要
情报检索
机器学习
范畴变量
自然语言处理
作者
Gianluca Moro,Luca Ragazzi,Lorenzo Valgimigli,Davide Freddi
标识
DOI:10.18653/v1/2022.acl-long.15
摘要
Although current state-of-the-art Transformer-based solutions succeeded in a wide range for single-document NLP tasks, they still struggle to address multi-input tasks such as multi-document summarization. Many solutions truncate the inputs, thus ignoring potential summary-relevant contents, which is unacceptable in the medical domain where each information can be vital. Others leverage linear model approximations to apply multi-input concatenation, worsening the results because all information is considered, even if it is conflicting or noisy with respect to a shared background. Despite the importance and social impact of medicine, there are no ad-hoc solutions for multi-document summarization. For this reason, we propose a novel discriminative marginalized probabilistic method (DAMEN) trained to discriminate critical information from a cluster of topic-related medical documents and generate a multi-document summary via token probability marginalization. Results prove we outperform the previous state-of-the-art on a biomedical dataset for multi-document summarization of systematic literature reviews. Moreover, we perform extensive ablation studies to motivate the design choices and prove the importance of each module of our method.
科研通智能强力驱动
Strongly Powered by AbleSci AI