Contrastive Masked Autoencoders are Stronger Vision Learners

人工智能 计算机科学 模式识别(心理学) 计算机视觉 自然语言处理 语音识别
作者
Zhicheng Huang,Xiaojie Jin,Cheng-Ze Lu,Qibin Hou,Ming‐Ming Cheng,Dongmei Fu,Xiaohui Shen,Jiashi Feng
出处
期刊:IEEE Transactions on Pattern Analysis and Machine Intelligence [Institute of Electrical and Electronics Engineers]
卷期号:46 (4): 2506-2517 被引量:129
标识
DOI:10.1109/tpami.2023.3336525
摘要

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger vision learner. Towards this goal, we propose Contrastive Masked Autoencoders (CMAE), a new self-supervised pre-training method for learning more comprehensive and capable vision representations. By elaboratively unifying contrastive learning (CL) and masked image model (MIM) through novel designs, CMAE leverages their respective advantages and learns representations with both strong instance discriminability and local perceptibility. Specifically, CMAE consists of two branches where the online branch is an asymmetric encoder-decoder and the momentum branch is a momentum updated encoder. During training, the online encoder reconstructs original images from latent representations of masked images to learn holistic features. The momentum encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart. To make CL compatible with MIM, CMAE introduces two new components, i.e., pixel shifting for generating plausible positive views and feature decoder for complementing features of contrastive pairs. Thanks to these novel designs, CMAE effectively improves the representation quality and transfer performance over its MIM counterpart. CMAE achieves the state-of-the-art performance on highly competitive benchmarks of image classification, semantic segmentation and object detection. Notably, CMAE-Base achieves 85.3% top-1 accuracy on ImageNet and 52.5% mIoU on ADE20k, surpassing previous best results by 0.7% and 1.8% respectively.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
欢喜大地发布了新的文献求助10
刚刚
华仔应助张宸煜采纳,获得10
1秒前
哈哈哈应助xyang2015采纳,获得10
2秒前
Akim应助xyang2015采纳,获得10
2秒前
yznfly应助yue采纳,获得200
3秒前
棕榈发布了新的文献求助10
3秒前
隐形白山完成签到,获得积分10
3秒前
6秒前
7秒前
鳗鱼尔安完成签到,获得积分10
8秒前
Orange应助积极的中蓝采纳,获得10
9秒前
11秒前
12秒前
12秒前
13秒前
️语完成签到 ,获得积分10
13秒前
爆米花应助TYM采纳,获得10
14秒前
16秒前
南风不竞发布了新的文献求助10
17秒前
番茄发布了新的文献求助10
18秒前
21秒前
wxnice完成签到,获得积分10
24秒前
怡然的安卉关注了科研通微信公众号
25秒前
苗苗完成签到 ,获得积分10
25秒前
一半一半完成签到 ,获得积分10
27秒前
科研通AI6.1应助树懒采纳,获得10
30秒前
自由的不弱应助桐夏采纳,获得10
31秒前
32秒前
含蓄期待发布了新的文献求助10
36秒前
任性吐司完成签到 ,获得积分10
37秒前
37秒前
陶醉的斓发布了新的文献求助20
43秒前
45秒前
46秒前
TYM发布了新的文献求助10
50秒前
51秒前
天天快乐应助TYM采纳,获得10
56秒前
wanci应助没心情Q采纳,获得10
56秒前
自由的听白完成签到,获得积分10
1分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Les Mantodea de guyane 2500
Signals, Systems, and Signal Processing 510
Discrete-Time Signals and Systems 510
Key Thinkers in Industrial and Organizational Psychology 500
A positive solution of a nonlinear elliptic equation in $\Bbb R^N$ with $G$-symmetry 200
Eine Fährtenschicht im mittelfränkischen Blasensandstein 200
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5869216
求助须知:如何正确求助?哪些是违规求助? 6449460
关于积分的说明 15660496
捐赠科研通 4984990
什么是DOI,文献DOI怎么找? 2688170
邀请新用户注册赠送积分活动 1630683
关于科研通互助平台的介绍 1588692