计算机科学
人工智能
特征学习
机器学习
稳健性(进化)
生成语法
深度学习
代表(政治)
生成模型
一般化
监督学习
领域(数学分析)
人工神经网络
生物化学
化学
数学
政治
政治学
法学
基因
数学分析
作者
Madeline C. Schiappa,Yogesh Singh Rawat,Mubarak Shah
出处
期刊:ACM Computing Surveys
[Association for Computing Machinery]
日期:2022-12-21
卷期号:55 (13s): 1-37
被引量:68
摘要
The remarkable success of deep learning in various domains relies on the availability of large-scale annotated datasets. However, obtaining annotations is expensive and requires great effort, which is especially challenging for videos. Moreover, the use of human-generated annotations leads to models with biased learning and poor domain generalization and robustness. As an alternative, self-supervised learning provides a way for representation learning that does not require annotations and has shown promise in both image and video domains. In contrast to the image domain, learning video representations are more challenging due to the temporal dimension, bringing in motion and other environmental dynamics. This also provides opportunities for video-exclusive ideas that advance self-supervised learning in the video and multimodal domains. In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain. We summarize these methods into four different categories based on their learning objectives: (1) pretext tasks , (2) generative learning , (3) contrastive learning , and (4) cross-modal agreement . We further introduce the commonly used datasets, downstream evaluation tasks, insights into the limitations of existing works, and the potential future directions in this area.
科研通智能强力驱动
Strongly Powered by AbleSci AI