A BERT-Based Joint Channel–Temporal Modeling for Action Recognition

计算机科学 判别式 编码器 频道(广播) 人工智能 钥匙(锁) 特征(语言学) 接头(建筑物) 模式识别(心理学) 特征提取 建筑工程 语言学 哲学 计算机安全 工程类 计算机网络 操作系统
作者
Man Yang,Lipeng Gan,Runze Cao,Xiaochao Li
出处
期刊:IEEE Sensors Journal [IEEE Sensors Council]
卷期号:23 (19): 23765-23779 被引量:1
标识
DOI:10.1109/jsen.2023.3303912
摘要

Action recognition provides an application for human action classification utilizing datasets captured by various sensor cameras. However, how to capture the key semantic features and subtle differences for fine-grained action recognition from redundant motion sequence is still a challenging task. To address this issue, we propose a novel bidirectional encoder representations from transformers (BERT)-based joint channel–temporal module to explore channel interaction correlation through channel–temporal embedded module and self-attention mechanism. The channel view branch is developed to capture the key channel semantic features through the interactive correlation between the subchannel feature sequences across frames. Our studies reveal that the channel interaction is crucial to explore these discriminative features among fine-grained action recognition categories. Furthermore, the channel view branch can work collaboratively with temporal view branch to take full advantage of channel interaction and channel–temporal dependencies through the joint learning via weight-sharing strategy. The proposed BERT-based joint channel–temporal module works in a plug-and-play way and can be integrated with 2-D backbones, such as temporal shift module (TSM), multiview fusion network (MVFNet), MotionSqueeze network (MSNet), and temporal difference network (TDN). Extensive experiments are carried out on the HMDB51, the MiniKinetics, the fine-grained Something-Something V1 & V2, and the multimodal N-UCLA datasets, and the results demonstrate the effectiveness of our joint channel–temporal module. Our method achieves 83.8%, 83.6%, 57.1%, and 68.2% top-1 accuracy on these single-modal datasets, respectively. The multimodal experiments on the N-UCLA dataset achieve 98.7% and 98.9% accuracy in RGB + skeleton and RGB + depth fusions.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
白山发布了新的文献求助10
1秒前
karaha发布了新的文献求助10
1秒前
无极微光应助HH采纳,获得20
1秒前
1秒前
2秒前
2秒前
学海无涯完成签到,获得积分10
2秒前
江11111完成签到,获得积分10
3秒前
汉堡包应助泥娃娃苘采纳,获得10
3秒前
Orange应助fangfeng采纳,获得10
3秒前
3秒前
腼腆的栾完成签到,获得积分10
3秒前
有风完成签到,获得积分10
4秒前
4秒前
4秒前
无极微光应助ps采纳,获得20
4秒前
柠木发布了新的文献求助10
4秒前
4秒前
JamesPei应助maoxinnan采纳,获得10
5秒前
怕黑耷发布了新的文献求助30
5秒前
hulahulla发布了新的文献求助10
6秒前
水解小博完成签到,获得积分10
6秒前
汉堡包应助咸蛋黄蘸酱采纳,获得10
6秒前
loooooo完成签到,获得积分10
6秒前
6秒前
6秒前
赘婿应助冷艳翠霜采纳,获得10
7秒前
有风发布了新的文献求助10
7秒前
梦回芊荨关注了科研通微信公众号
7秒前
飘逸冷珍发布了新的文献求助10
7秒前
星辰大海应助络绎采纳,获得10
8秒前
516165165完成签到,获得积分10
8秒前
8秒前
脑洞疼应助HH采纳,获得10
9秒前
zhanglan123完成签到,获得积分20
9秒前
9秒前
云淡风轻发布了新的文献求助10
9秒前
9秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Picture this! Including first nations fiction picture books in school library collections 2000
The Cambridge History of China: Volume 4, Sui and T'ang China, 589–906 AD, Part Two 1500
Cowries - A Guide to the Gastropod Family Cypraeidae 1200
Quality by Design - An Indispensable Approach to Accelerate Biopharmaceutical Product Development 800
ON THE THEORY OF BIRATIONAL BLOWING-UP 666
Signals, Systems, and Signal Processing 610
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6391343
求助须知:如何正确求助?哪些是违规求助? 8206423
关于积分的说明 17370219
捐赠科研通 5444992
什么是DOI,文献DOI怎么找? 2878734
邀请新用户注册赠送积分活动 1855226
关于科研通互助平台的介绍 1698491