Exploring Fusion Techniques and Explainable AI on Adapt-FuseNet: Context-Adaptive Fusion of Face and Gait for Person Identification

鉴定（生物学）步态背景（考古学）面子（社会学概念）人工智能计算机科学融合计算机视觉传感器融合物理医学与康复医学地理生物社会学语言学植物社会科学哲学考古

作者

S Thejaswin,Ashwin Prakash,Athira Nambiar,Alexandre Bernadino

出处

期刊：IEEE transactions on biometrics, behavior, and identity science [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：: 1-1

标识

DOI：10.1109/tbiom.2024.3405081

摘要

Biometrics such as human gait and face play a significant role in vision-based surveillance applications. However, multimodal fusion of biometric features is a challenging task in non-controlled environments due to varying reliability of the features from different modalities in changing contexts, such as viewpoints, illuminations, occlusion, background clutter, and clothing. For instance, in person identification in the wild, facial and gait features play a complementary role, as, in principle, face provides more discriminatory features than gait if the person is frontal to the camera, while gait features are more discriminative in lateral views. Classical fusion techniques typically address this problem by explicitly computing in which context the data is obtained (e.g. frontal or lateral) and designing custom data fusion strategies for each context. However, this requires an initial enumeration of all the possible contexts and the design of context "detectors", which bring their own challenges. Hence, how to effectively utilize both facial and gait information in arbitrary conditions is still an open problem. In this paper we present a context-adaptive multi-biometric fusion strategy that does not require the prior determination of context features; instead, the context is implicitly encoded in the fusion process by a set of attentional weights that encode the relevance of the different modalities for each particular data sample. The key contributions of the paper are threefold. First, we propose a novel framework for the dynamic fusion of multiple biometrics modalities leveraging attention techniques, denoted 'Adapt-FuseNet'. Second, we perform an extensive evaluation of the proposed method in comparison to various other fusion techniques such as Bilinear Pooling, Parallel Co-attention, Keyless Attention, Multi-modal Factorized High-order Pooling, and Multimodal Tucker Fusion. Third, an Explainable Artificial Intelligence-based interpretation tool is used to analyse how the attention mechanism of 'Adapt-FuseNet' is capturing context implicitly and making the best weighting of the different modalities for the task at hand. This enables the interpretability of results in a more human-compliant way, hence boosting our confidence of the operation of AI systems in the wild. Extensive experiments are carried out on two public gait datasets (CASIA-A and CASIA-B), showing that 'Adapt-FuseNet' significantly outperforms the state-of-the-art.

求助该文献

最长约 10秒，即可获得该文献文件

Exploring Fusion Techniques and Explainable AI on Adapt-FuseNet: Context-Adaptive Fusion of Face and Gait for Person Identification

今日热心研友