计算机科学
代表(政治)
人工智能
计算机视觉
计算机图形学(图像)
多媒体
政治学
政治
法学
作者
Ruihai Wu,Yourong Zhang,Yu Qi,A. Chen,Hao Dong
标识
DOI:10.1145/3652583.3658010
摘要
With the development of Embodied AI, Robotics and Augmented Reality, videos captured from the 'first-person' point of view, also known as egocentric videos, are arousing interests in Computer Vision and Robotics communities. Further, learning a proper representation of egocentric videos can benefit diverse downstream tasks like action forecasting and human object interactions, further beneficial for robotic planning. However, current works mostly focus on learning the temporal or topological information for egocentric video representations, while the activity patterns, which reveal the behavior regularities or the intentions of people or robots in a more explicit way, are not carefully considered. In this paper, we propose a novel framework, Pattern4Ego, that learns the representations of egocentric videos using cross-video activity patterns. This framework achieves state-of-the-art performance on two representative egocentric video tasks: long-term action anticipation and context-based environment affordance.
科研通智能强力驱动
Strongly Powered by AbleSci AI