计算机科学
事件(粒子物理)
人工智能
水准点(测量)
帧(网络)
计算机视觉
帧速率
代表(政治)
低延迟(资本市场)
模式识别(心理学)
实时计算
政治
地理
法学
物理
电信
量子力学
计算机网络
政治学
大地测量学
作者
Bochen Xie,Yongjian Deng,Zhanpeng Shao,Hai Liu,Qingsong Xu,Youfu Li
标识
DOI:10.1109/crc55853.2022.10041200
摘要
Event cameras asynchronously capture pixel-level intensity changes in scenes and output a stream of events. Compared with traditional frame-based cameras, they can offer competitive imaging characteristics: low latency, high dynamic range, and low power consumption. It means that event cameras are ideal for vision tasks in dynamic scenarios, such as human action recognition. The best-performing event-based algorithms convert events into frame-based representations and feed them into existing learning models. However, generating informative frames for long-duration event streams is still a challenge since event cameras work asynchronously without a fixed frame rate. In this work, we propose a novel frame-based representation named Compact Event Image (CEI) for action recognition. This representation is generated by a self-attention based module named Event Tubelet Compressor (EVTC) in a learnable way. The EVTC module adaptively summarizes the long-term dynamics and temporal patterns of events into a CEI frame set. We can combine EVTC with conventional video backbones for end-to-end event-based action recognition. We evaluate our approach on three benchmark datasets, and experimental results show it outperforms state-of-the-art methods by a large margin.
科研通智能强力驱动
Strongly Powered by AbleSci AI