对偶(序理论)
变压器
算法
状态空间
数学
计算机科学
数学优化
域代数上的
纯数学
电气工程
工程类
电压
统计
出处
期刊:Cornell University - arXiv
日期:2024-05-31
被引量:7
标识
DOI:10.48550/arxiv.2405.21060
摘要
While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.
科研通智能强力驱动
Strongly Powered by AbleSci AI