计算机科学
答疑
安全性令牌
情报检索
背景(考古学)
编码器
自然语言处理
人工智能
对话
模棱两可
生成语法
标识符
词(群论)
解码方法
程序设计语言
语言学
古生物学
电信
哲学
计算机安全
生物
操作系统
作者
Yongqi Li,Nan Yang,Liang Wang,Furu Wei,Wenjie Li
标识
DOI:10.1016/j.ipm.2023.103475
摘要
Effective passage retrieval is crucial for conversation question answering (QA) but challenging due to the ambiguity of questions. Current methods rely on the dual-encoder architecture to embed contextualized vectors of questions in conversations. However, this architecture is limited in the embedding bottleneck and the dot-product operation. To alleviate these limitations, we propose generative retrieval for conversational QA (GCoQA). GCoQA assigns distinctive identifiers for passages and retrieves passages by generating their identifiers token-by-token via the encoder–decoder architecture. In this generative way, GCoQA eliminates the need for a vector-style index and could attend to crucial tokens of the conversation context at every decoding step. We conduct experiments on three public datasets over a corpus containing about twenty million passages. The results show GCoQA achieves relative improvements of +13.6% in passage retrieval and +42.9% in document retrieval. GCoQA is also efficient in terms of memory usage and inference speed, which only consumes 1/10 of the memory and takes in less than 33% of the time. The code and data are released at https://github.com/liyongqi67/GCoQA.
科研通智能强力驱动
Strongly Powered by AbleSci AI