Patches Are All You Need?

业务
作者
Trockman, Asher,Kolter, J. Zico
出处
期刊:Cornell University - arXiv 被引量:1
标识
DOI:10.48550/arxiv.2201.09792
摘要

Although convolutional networks have been the dominant architecture for vision tasks for many years, recent experiments have shown that Transformer-based models, most notably the Vision Transformer (ViT), may exceed their performance in some settings. However, due to the quadratic runtime of the self-attention layers in Transformers, ViTs require the use of patch embeddings, which group together small regions of the image into single input features, in order to be applied to larger image sizes. This raises a question: Is the performance of ViTs due to the inherently-more-powerful Transformer architecture, or is it at least partly due to using patches as the input representation? In this paper, we present some evidence for the latter: specifically, we propose the ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on patches as input, separates the mixing of spatial and channel dimensions, and maintains equal size and resolution throughout the network. In contrast, however, the ConvMixer uses only standard convolutions to achieve the mixing steps. Despite its simplicity, we show that the ConvMixer outperforms the ViT, MLP-Mixer, and some of their variants for similar parameter counts and data set sizes, in addition to outperforming classical vision models such as the ResNet. Our code is available at https://github.com/locuslab/convmixer.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
和谐的秋玲完成签到,获得积分10
刚刚
标致的飞烟完成签到,获得积分10
1秒前
wuhao完成签到,获得积分10
1秒前
2秒前
clearsky发布了新的文献求助10
2秒前
2秒前
binxman发布了新的文献求助10
2秒前
3秒前
3秒前
Ava应助Jamie采纳,获得10
3秒前
一枪入魂完成签到,获得积分10
3秒前
6秒前
7秒前
陈某某完成签到,获得积分20
7秒前
所所应助777采纳,获得10
8秒前
8秒前
Zayro发布了新的文献求助10
8秒前
阔达的秀发完成签到,获得积分10
8秒前
yxl发布了新的文献求助200
9秒前
10秒前
10秒前
大模型应助拼搏千琴采纳,获得10
10秒前
10秒前
12秒前
12秒前
12秒前
lucky发布了新的文献求助10
13秒前
白白发布了新的文献求助10
13秒前
CHENG发布了新的文献求助10
14秒前
范范范关注了科研通微信公众号
14秒前
qxxxxx发布了新的文献求助30
14秒前
传奇3应助森林木采纳,获得10
15秒前
李健的粉丝团团长应助nnn采纳,获得10
15秒前
15秒前
黎长江发布了新的文献求助10
16秒前
17秒前
Huyq发布了新的文献求助10
17秒前
qh0305发布了新的文献求助10
17秒前
爆米花应助白白采纳,获得10
18秒前
欢喜电灯胆完成签到,获得积分10
18秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Handbook of pharmaceutical excipients, Ninth edition 5000
Aerospace Standards Index - 2026 ASIN2026 3000
Signals, Systems, and Signal Processing 610
Discrete-Time Signals and Systems 610
Social Work and Social Welfare: An Invitation(7th Edition) 410
Medical Management of Pregnancy Complicated by Diabetes 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 纳米技术 有机化学 物理 生物化学 化学工程 计算机科学 复合材料 内科学 催化作用 光电子学 物理化学 电极 冶金 遗传学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 6057308
求助须知:如何正确求助?哪些是违规求助? 7890186
关于积分的说明 16294107
捐赠科研通 5202660
什么是DOI,文献DOI怎么找? 2783568
邀请新用户注册赠送积分活动 1766245
关于科研通互助平台的介绍 1646964