计算机科学
人工智能
变压器
特征提取
模式识别(心理学)
融合机制
对偶(语法数字)
融合
数据挖掘
语言学
哲学
物理
文学类
量子力学
电压
脂质双层融合
艺术
作者
Yongjie Wang,Feng Wang,Dongyang Huang
标识
DOI:10.1016/j.eswa.2023.121272
摘要
A dense crowd counting method based on self-attention mechanism with dual-branch fusion network is proposed in this paper. Our method aims to address the problems of large variations in head scales and complex backgrounds in dense crowd images. This method combines the CNN and Transformer network frameworks and consists of shallow feature extraction network, dual-branch fusion network, and deep feature extraction network. The VGG16 network is employed by the shallow feature extraction network to extract low-level features. A multi-scale CNN branch and a Transformer branch built on an improved self-attention module make up the dual-branch fusion network, which collects local and global information on crowd areas, respectively. The Transformer network, which is based on a mixed attention module, is employed by the deep feature extraction network to further separate complicated backgrounds and concentrate on crowd areas. Both counting-level weakly supervised and location-level fully supervised methods are employed in the experiments. On four widely used datasets, the results demonstrate that the proposed method outperforms the most recent research. Our method has a higher counting accuracy with low parameter volumes and a counting accuracy of 89.1% under full supervision when compared to existing weakly supervised methods. The results of the experiments demonstrate that the method has excellent crowd counting performance and can accurately count in high-density and high-occlusion scenes.
科研通智能强力驱动
Strongly Powered by AbleSci AI