不变(物理)
人工智能
计算机科学
零(语言学)
模式识别(心理学)
数学
数学物理
语言学
哲学
作者
Tian Zhang,Kongming Liang,Ruoyi Du,Wei Chen,Zhanyu Ma
标识
DOI:10.1109/tpami.2024.3487222
摘要
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and its corresponding composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit spurious correlations and bias towards seen compositions. Instead, we reconsider CZSL as an out-of-distribution generalization problem. If an object is treated as a domain, we can learn object-invariant features to recognize attributes attached to any object reliably, and vice versa. Specifically, we propose an invariant feature learning framework to align different domains at the representation and gradient levels to capture the intrinsic characteristics associated with the tasks. To further facilitate and encourage the disentanglement of attributes and objects, we propose an "encoding-reshuffling-decoding" process to help the model avoid spurious correlations by randomly regrouping the disentangled features into synthetic features. Ultimately, our method improves generalization by learning to disentangle features that represent two independent factors of attributes and objects. Experiments demonstrate that the proposed method achieves state-of-the-art or competitive performance in both closed-world and open-world scenarios. Codes are available at https://github.com/PRIS-CV/Disentangling-before-Composing.
科研通智能强力驱动
Strongly Powered by AbleSci AI