计算机科学
姿势
估计
人工智能
情报检索
工程类
系统工程
作者
Tien-Dat Tran,Xuan-Thuy Vo,Duy-Linh Nguyen,Kang-Hyun Jo
出处
期刊:Communications in computer and information science
日期:2021-01-01
卷期号:: 242-250
标识
DOI:10.1007/978-3-030-81638-4_20
摘要
Not only for human pose estimation but also other machine vision tasks (e.g. object recognition, semantic segmentation, image classification), convolution neural networks (CNNs) have obtained the highest performance today. Besides, their performance over other traditional networks is shown by the Attention Module (AM). Hence, this paper focuses on a valuable feed-forward AM for CNNs. First, feed the feature map into the attention module after a stage in the backbone network, divided into two different dimensions, channel and spatial. After that, by multiplication, the AM combines these two feature maps and gives them to the next stage in the backbone. In long-range dependencies (channel) and spatial data, the network can capture more information, which can gain better precision efficiency. Our experimental findings would also demonstrate the disparity between the use of the attention module and current methods. As a result, with the change to make the spatial better, the expected joint heatmap retains the accuracy while decreasing the number of parameters. In comparison, the proposed architecture benefits more than the baseline by 1.3 points in AP. In addition, the proposed network was trained on the benchmarks of COCO 2017, which is now an open dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI