期刊:IEEE Sensors Journal [Institute of Electrical and Electronics Engineers] 日期:2022-12-08卷期号:23 (4): 3747-3755被引量:2
标识
DOI:10.1109/jsen.2022.3226466
摘要
Field motion target classification is the classification of ground motion targets such as personnel, wheeled, and tracked vehicles. The common classification methods are to use a single signal such as acoustic or seismic signal as input, which extracts less in-depth information than multimodal. In order to improve the accuracy of field target classification and recognition, an acoustic–seismic multimodal fusion network model with two-stream networks, i.e., WaveNet based on the raw audio and LMNet based on logmel spectrogram, is proposed. This article first introduces asymmetric convolution to extract time–frequency information separately and designs a temporal attention module to enable the network to make full use of the relevant temporal information in different channels. Then, the proposed network is used to extract and fuse the depth features of the four-channel acoustic signal and the single-channel seismic signal to obtain the final target classification results. Meanwhile, a data enhancement scheme is explored in order to avoid the possible overfitting caused by the limited training data. Notably, the model achieves 86.07%, 92.56%, and 98.08% classification accuracy on the seismic, acoustic, and acoustic–seismic datasets, respectively, and the classification accuracy of the multimodal model is significantly higher than that of the unimodal model. The ablation study shows that the classification accuracy of the framework is improved by 0.62–6.83% compared to the pure nets. Compared with existing deep-learning models, the best result of multimodal model demonstrates a relative improvement of 2.05%.