萃取(化学)
计算机科学
人工智能
特征提取
遥感
计算机视觉
地质学
色谱法
化学
作者
Xiaofeng Shi,Junyu Gao,Yuan Yuan
标识
DOI:10.1109/tgrs.2024.3392631
摘要
In recent years, deep learning and multi-modal data have substantially propelled the development of building extraction models. However, prevailing multi-modal methods are difficult to cope with two challenges: 1) modal laziness: the training error is minimized before the model has learned extensive uni-modal patterns; 2) modal imbalance: the backpropagation process is easily dominated by a certain modality. As a result, the uni-modal features learning is insufficient, leading to limited performance of the model when dealing with the intricate foreground and background contexts surrounding the buildings. In this paper, we deal with this problem from the perspective of algorithm and model evaluation. At the algorithmic level, we propose a Uni-modal Feature Enhancement (UFE) framework. Specifically, UFE is model-agnostic, comprising two distinct components: Adaptive Gradient Enhancement (AGE) for modal laziness and Consistency Constraint Loss (CCL) for modal imbalance. AGE dynamically modulates the original gradient by monitoring the representation effects of uni-modal features and multi-modal fusion features. CCL imposes mutual constraints on diverse modal branches at the semantic level to reconcile the optimization process. At the model evaluation level, a new metric, named Uni-modal Utilization Ratio (UUR), is presented to assess models through the learning efficacy of uni-modal features. The experimental results including the variants of UUR on two building extraction datasets demonstrate a substantial performance improvement by UFE. Moreover, UFE also exhibits its adaptability when integrated with various model components and its generalization on other multi-modal image-related tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI