计算机科学
核(代数)
卷积神经网络
参数化复杂度
树核
人工智能
可扩展性
缩放比例
变压器
模式识别(心理学)
对比度(视觉)
核方法
支持向量机
算法
分布的核嵌入
数学
物理
几何学
组合数学
数据库
电压
量子力学
作者
Xiaohan Ding,Xiangyu Zhang,Jungong Han,Guiguang Ding
标识
DOI:10.1109/cvpr52688.2022.01166
摘要
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm. We suggested five guidelines, e.g., applying re-parameterized large depthwise convolutions, to design efficient high-performance large-kernel CNNs. Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31×31, in contrast to commonly used 3×3. RepLKNet greatly closes the performance gap between CNNs and ViTs, e.g., achieving comparable or superior results than Swin Transformer on ImageNet and a few typical downstream tasks, with lower latency. RepLKNet also shows nice scalability to big data and large models, obtaining 87.8% top-1 accuracy on ImageNet and 56.0% mIoU on ADE20K, which is very competitive among the state-of-the-arts with similar model sizes. Our study further reveals that, in contrast to small-kernel CNNs, large-kernel CNNs have much larger effective receptive fields and higher shape bias rather than texture bias. Code & models at https://github.com/megvii-research/RepLKNet.
科研通智能强力驱动
Strongly Powered by AbleSci AI