计算机科学
人工智能
特征(语言学)
期限(时间)
帧(网络)
背景(考古学)
目标检测
计算机视觉
对象(语法)
视频跟踪
特征提取
模式识别(心理学)
电信
古生物学
哲学
语言学
物理
量子力学
生物
作者
Jinsheng Xiao,Yuanxu Wu,Yunhua Chen,Shurui Wang,Zhongyuan Wang,Jiayi Ma
标识
DOI:10.1109/cvpr52729.2023.01404
摘要
Video small object detection is a difficult task due to the lack of object information. Recent methods focus on adding more temporal information to obtain more potent high-level features, which often fail to specify the most vital information for small objects, resulting in insufficient or inappropriate features. Since information from frames at different positions contributes differently to small objects, it is not ideal to assume that using one universal method will extract proper features. We find that context information from the long-term frame and temporal information from the short-term frame are two useful cues for video small object detection. To fully utilize these two cues, we propose a long short-term feature enhancement network (LSTFE-Net) for video small object detection. First, we develop a plug-and-play spatiotemporal feature alignment module to create temporal correspondences between the short-term and current frames. Then, we propose a frame selection module to select the long-term frame that can provide the most additional context information. Finally, we propose a long short-term feature aggregation module to fuse long short-term features. Compared to other state-of-the-art methods, our LSTFE-Net achieves 4.4% absolute boosts in AP on the FL-Drones dataset. More details can be found at https://github.com/xiaojs18/LSTFE-Net.
科研通智能强力驱动
Strongly Powered by AbleSci AI