计算机科学
接头(建筑物)
共同注意
人工智能
代表(政治)
匹配(统计)
集合(抽象数据类型)
特征学习
可穿戴计算机
深度学习
机器学习
心理学
自闭症
法学
程序设计语言
建筑工程
政治学
嵌入式系统
发展心理学
数学
工程类
统计
政治
作者
Huangyue Yu,Minjie Cai,Yunfei Liu,Feng Lu
标识
DOI:10.1109/tpami.2020.3030048
摘要
Recent years have witnessed a tremendous increase of first-person videos captured by wearable devices. Such videos record information from different perspectives than the traditional third-person view, and thus show a wide range of potential usages. However, techniques for analyzing videos from different views can be fundamentally different, not to mention co-analyzing on both views to explore the shared information. In this paper, we take the challenge of cross-view video co-analysis and deliver a novel learning-based method. At the core of our method is the notion of "joint attention", indicating the shared attention regions that link the corresponding views, and eventually guide the shared representation learning across views. To this end, we propose a multi-branch deep network, which extracts cross-view joint attention and shared representation from static frames with spatial constraints, in a self-supervised and simultaneous manner. In addition, by incorporating the temporal transition model of the joint attention, we obtain spatial-temporal joint attention that can robustly capture the essential information extending through time. Our method outperforms the state-of-the-art on the standard cross-view video matching tasks on public datasets. Furthermore, we demonstrate how the learnt joint information can benefit various applications through a set of qualitative and quantitative experiments.
科研通智能强力驱动
Strongly Powered by AbleSci AI