变压器
计算机科学
情报检索
排名(信息检索)
人工智能
机器学习
数据挖掘
工程类
电压
电气工程
作者
Jyun‐Yu Jiang,Chenyan Xiong,Chia‐Jung Lee,Wei Wang
标识
DOI:10.18653/v1/2020.findings-emnlp.412
摘要
The computing cost of transformer selfattention often necessitates breaking long documents to fit in pretrained models in document ranking tasks.In this paper, we design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention.Our model, QDS-Transformer, enforces the principle properties desired in ranking: local contextualization, hierarchical representation, and query-oriented proximity matching, while it also enjoys efficiency from sparsity.Experiments on one fully supervised and three few-shot TREC document ranking benchmarks demonstrate the consistent and robust advantage of QDS-Transformer over previous approaches, as they either retrofit long documents into BERT or use sparse attention without emphasizing IR principles.We further quantify the computing complexity and demonstrates that our sparse attention with TVM implementation is twice more efficient that the fully-connected selfattention.All source codes, trained model, and predictions of this work are available at https://github.com/hallogameboy/QDS-Transformer.
科研通智能强力驱动
Strongly Powered by AbleSci AI