计算机科学
判别式
瓶颈
核(代数)
人工智能
代表(政治)
可扩展性
编码(集合论)
机器学习
失败
分割
深层神经网络
源代码
目标检测
模式识别(心理学)
深度学习
并行计算
法学
组合数学
程序设计语言
集合(抽象数据类型)
嵌入式系统
操作系统
政治
数据库
数学
政治学
作者
Siyuan Li,Zedong Wang,Zicheng Liu,Cheng Tan,Haitao Lin,Di Wu,Zhiyuan Chen,Jiangbin Zheng,Stan Z. Li
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:14
标识
DOI:10.48550/arxiv.2211.03295
摘要
By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on \textit{multi-order game-theoretic interaction} within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D\&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0\% and 87.8\% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59\% FLOPs and 17M parameters, respectively. The source code is available at \url{https://github.com/Westlake-AI/MogaNet}.
科研通智能强力驱动
Strongly Powered by AbleSci AI