安全性令牌
瓶颈
计算机科学
理论计算机科学
背景(考古学)
领域(数学分析)
人工智能
情报检索
机器学习
数学
计算机安全
生物
数学分析
嵌入式系统
古生物学
作者
Wei Zhong,Sheng-Chieh Lin,Jheng-Hong Yang,Jimmy Lin
标识
DOI:10.1145/3539618.3591746
摘要
Neural retrievers have been shown to be effective for math-aware search. Their ability to cope with math symbol mismatches, to represent highly contextualized semantics, and to learn effective representations are critical to improving math information retrieval. However, the most effective retriever for math remains impractical as it depends on token-level dense representations for each math token, which leads to prohibitive storage demands, especially considering that math content generally consumes more tokens. In this work, we try to alleviate this efficiency bottleneck while boosting math information retrieval effectiveness via hybrid search. To this end, we propose MABOWDOR, a Math-Aware Bestof-Worlds Domain Optimized Retriever, which has an unsupervised structure search component, a dense retriever, and optionally a sparse retriever on top of a domain-adapted backbone learned by context-enhanced pretraining, each addressing a different need in retrieving heterogeneous data from math documents. Our hybrid search outperforms the previous state-of-the-art math IR system while eliminating efficiency bottlenecks. Our system is available at https://github.com/approach0/pya0.
科研通智能强力驱动
Strongly Powered by AbleSci AI