GFNet: Global Filter Networks for Visual Recognition

计算机科学 人工智能 模式识别(心理学) 滤波器(信号处理) 算法 机器学习 计算机视觉
作者
Yongming Rao,Wenliang Zhao,Zheng Zhu,Jie Zhou,Jiwen Lu
出处
期刊:IEEE Transactions on Pattern Analysis and Machine Intelligence [IEEE Computer Society]
卷期号:45 (9): 10960-10973 被引量:27
标识
DOI:10.1109/tpami.2023.3263824
摘要

Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases. These models are generally based on learning interaction among spatial locations from raw data. The complexity of self-attention and MLP grows quadratically as the image size increases, which makes these models hard to scale up when high-resolution features are required. In this paper, we present the Global Filter Network (GFNet), a conceptually simple yet computationally efficient architecture, that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our architecture replaces the self-attention layer in vision Transformers with three key operations: a 2D discrete Fourier transform, an element-wise multiplication between frequency-domain features and learnable global filters, and a 2D inverse Fourier transform. Based on this basic design, we develop a series of isotropic models with a Transformer-style simple architecture and CNN-style hierarchical models with better performance. Isotropic GFNet models exhibit favorable accuracy/complexity trade-offs compared to recent vision Transformers and pure MLP models. Hierarchical GFNet models can inherit successful designs in CNNs and be easily scaled up with larger model sizes and more training data, showing strong performance on both image classification (e.g., 85.0% top-1 accuracy on ImageNet-1 k without any extra data or supervision, and 87.4% accuracy with ImageNet-21 k pre-training) and dense prediction tasks (e.g., 54.3 mIoU on ADE20 k val). Our results demonstrate that GFNet can be a very competitive alternative to Transformer-based models and CNNs in terms of efficiency, generalization ability and robustness. Code is available at https://github.com/raoyongming/GFNet.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
神勇的茉莉关注了科研通微信公众号
刚刚
九三完成签到,获得积分10
1秒前
1秒前
CipherSage应助Hao采纳,获得10
2秒前
领导范儿应助许净采纳,获得10
3秒前
5秒前
bkagyin应助撕裂伤口采纳,获得10
9秒前
10秒前
风趣采白完成签到,获得积分10
12秒前
13秒前
许净发布了新的文献求助10
15秒前
pluto应助才欣宇采纳,获得20
19秒前
出门见喜发布了新的文献求助10
19秒前
王哪跑12完成签到,获得积分10
20秒前
情怀应助木九采纳,获得10
20秒前
大个应助出门见喜采纳,获得10
24秒前
稳重的招牌完成签到,获得积分20
32秒前
33秒前
TARCY发布了新的文献求助10
33秒前
35秒前
35秒前
小马甲应助科研通管家采纳,获得10
35秒前
科研通AI5应助科研通管家采纳,获得10
35秒前
华仔应助科研通管家采纳,获得10
36秒前
深情安青应助科研通管家采纳,获得10
36秒前
隐形曼青应助科研通管家采纳,获得10
36秒前
Orange应助科研通管家采纳,获得30
36秒前
科研通AI2S应助科研通管家采纳,获得10
36秒前
36秒前
Lucas应助科研通管家采纳,获得10
36秒前
科研通AI2S应助科研通管家采纳,获得10
36秒前
36秒前
深情安青应助科研通管家采纳,获得10
36秒前
Lucas应助Panchael采纳,获得10
36秒前
ding应助哔哩哔哩采纳,获得10
38秒前
DDking发布了新的文献求助10
38秒前
41秒前
科研通AI5应助puppyNk采纳,获得10
41秒前
痴情的纸飞机完成签到 ,获得积分10
42秒前
他的二仙桥完成签到,获得积分20
43秒前
高分求助中
All the Birds of the World 4000
Production Logging: Theoretical and Interpretive Elements 3000
Les Mantodea de Guyane Insecta, Polyneoptera 2000
Machine Learning Methods in Geoscience 1000
Resilience of a Nation: A History of the Military in Rwanda 888
Musculoskeletal Pain - Market Insight, Epidemiology And Market Forecast - 2034 666
Crystal Nonlinear Optics: with SNLO examples (Second Edition) 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3735960
求助须知:如何正确求助?哪些是违规求助? 3279656
关于积分的说明 10016904
捐赠科研通 2996399
什么是DOI,文献DOI怎么找? 1644045
邀请新用户注册赠送积分活动 781753
科研通“疑难数据库(出版商)”最低求助积分说明 749425