计算机科学
安全性令牌
变压器
语言模型
编码器
答疑
自然语言处理
人工智能
刮擦
程序设计语言
操作系统
物理
量子力学
电压
作者
Sebastian Borgeaud,Arthur Mensch,Jordan Hoffmann,Trevor Cai,Eliza Rutherford,Katie Millican,George van den Driessche,Jean-Baptiste Lespiau,Bogdan Damoc,Aidan Clark,Diego de Las Casas,Aurelia Guy,Jacob Menick,Roman Ring,Tom Hennigan,Saffron Huang,Loren Maggiore,Chris Jones,Albin Cassirer,Andy Brock
出处
期刊:Cornell University - arXiv
日期:2021-01-01
被引量:176
标识
DOI:10.48550/arxiv.2112.04426
摘要
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.
科研通智能强力驱动
Strongly Powered by AbleSci AI