期刊:IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers] 日期:2025-01-01卷期号:: 1-1
标识
DOI:10.1109/tcsvt.2025.3525734
摘要
The continuous development of Earth observation (EO) technology has significantly increased the availability of multi-sensor remote sensing (RS) data. The fusion of hyperspectral image (HSI) and light detection and ranging (LiDAR) data has become a research hotspot. Current mainstream convolutional neural networks (CNNs) excel at extracting local features from images but have limitations in modeling global information, which may affect the performance of classification tasks. In contrast, modern graph convolutional networks (GCNs) excel at capturing global information, particularly demonstrating significant advantages when processing RS images with irregular topological structures. By integrating these two frameworks, features can be fused from multiple perspectives, enabling a more comprehensive capture of multimodal data attributes and improving classification performance. The paper proposes a spatial-spectral-structural feature fusion network (S3F2Net) for HSI and LiDAR data classification. S3F2Net utilizes multiple architectures to extract rich features of multimodal data from different perspectives. On one hand, local spatial and spectral features of multimodal data are extracted using CNN, enhancing interactions among heterogeneous data through shared-weight convolution to achieve detailed representations of land cover. On the other hand, the global topological structure is learned using GCN, which models the spatial relationships between land cover types through graph structure constructed from LiDAR data, thereby enhancing the model's understanding of scene content. Furthermore, the dynamic node updating strategy within the GCN enhances the model's ability to identify representative nodes for specific land cover types while facilitating information aggregation among remote nodes, thereby strengthening adaptability to complex topological structures. By employing a multi-level information fusion strategy to integrate data representations from both global and local perspectives, the accuracy and reliability of the results are ensured. Compared with state-of-the-art (SOTA) methods, the framework's validity is verified on three real multimodal RS datasets. The source code will be available at https://github.com/slylnnu/S3F2Net.