Patches Are All You Need?

业务
作者
Trockman, Asher,Kolter, J. Zico
出处
期刊:Cornell University - arXiv 被引量:1
标识
DOI:10.48550/arxiv.2201.09792
摘要

Although convolutional networks have been the dominant architecture for vision tasks for many years, recent experiments have shown that Transformer-based models, most notably the Vision Transformer (ViT), may exceed their performance in some settings. However, due to the quadratic runtime of the self-attention layers in Transformers, ViTs require the use of patch embeddings, which group together small regions of the image into single input features, in order to be applied to larger image sizes. This raises a question: Is the performance of ViTs due to the inherently-more-powerful Transformer architecture, or is it at least partly due to using patches as the input representation? In this paper, we present some evidence for the latter: specifically, we propose the ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on patches as input, separates the mixing of spatial and channel dimensions, and maintains equal size and resolution throughout the network. In contrast, however, the ConvMixer uses only standard convolutions to achieve the mixing steps. Despite its simplicity, we show that the ConvMixer outperforms the ViT, MLP-Mixer, and some of their variants for similar parameter counts and data set sizes, in addition to outperforming classical vision models such as the ResNet. Our code is available at https://github.com/locuslab/convmixer.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
超帅昂发布了新的文献求助10
1秒前
XudongHou完成签到,获得积分10
2秒前
2秒前
3秒前
林夕夕发布了新的文献求助10
3秒前
4秒前
流星发布了新的文献求助10
4秒前
李爱国应助华华采纳,获得10
5秒前
acarbose发布了新的文献求助10
5秒前
研友_CCQ_M完成签到,获得积分10
5秒前
6秒前
7秒前
濯心发布了新的文献求助10
7秒前
Nice发布了新的文献求助10
8秒前
9秒前
niko发布了新的文献求助10
9秒前
褚驳发布了新的文献求助10
9秒前
10秒前
今后应助慕容迎松采纳,获得10
10秒前
10秒前
小小发布了新的文献求助10
11秒前
锦鲤完成签到,获得积分10
13秒前
大静发布了新的文献求助10
13秒前
UPT发布了新的文献求助10
13秒前
Nice完成签到,获得积分10
14秒前
kento发布了新的文献求助10
15秒前
16秒前
kudou发布了新的文献求助10
16秒前
濯心完成签到,获得积分20
16秒前
UPT完成签到 ,获得积分10
17秒前
Hello应助kudou采纳,获得10
21秒前
子仁先生善掀桌完成签到,获得积分10
23秒前
白糖完成签到,获得积分20
24秒前
阵雨发布了新的文献求助10
25秒前
彭于晏应助Wenxianxiazai77采纳,获得10
26秒前
666完成签到 ,获得积分10
26秒前
28秒前
29秒前
29秒前
科研通AI2S应助小七采纳,获得10
30秒前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2500
Востребованный временем 2500
Aspects of Babylonian celestial divination : the lunar eclipse tablets of enuma anu enlil 1500
Agaricales of New Zealand 1: Pluteaceae - Entolomataceae 1040
Healthcare Finance: Modern Financial Analysis for Accelerating Biomedical Innovation 1000
Classics in Total Synthesis IV: New Targets, Strategies, Methods 1000
Devlopment of GaN Resonant Cavity LEDs 666
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 纳米技术 内科学 物理 化学工程 计算机科学 复合材料 基因 遗传学 物理化学 催化作用 细胞生物学 免疫学 电极
热门帖子
关注 科研通微信公众号,转发送积分 3455164
求助须知:如何正确求助?哪些是违规求助? 3050441
关于积分的说明 9021374
捐赠科研通 2739114
什么是DOI,文献DOI怎么找? 1502413
科研通“疑难数据库(出版商)”最低求助积分说明 694501
邀请新用户注册赠送积分活动 693293