Learning for mismatch removal via graph attention networks

人工智能计算机科学离群值模式识别（心理学）规范化（社会学）姿势图形计算机视觉尺度不变特征变换特征提取理论计算机科学人类学社会学

作者

Xingyu Jiang,Yang Wang,Aoxiang Fan,Jiayi Ma

出处

期刊：Isprs Journal of Photogrammetry and Remote Sensing 日期：2022-08-01 卷期号：190: 181-195 被引量：12

标识

DOI：10.1016/j.isprsjprs.2022.06.009

摘要

Recovering camera pose from two-view images is a critical problem in photogrammetry and computer vision. For complex scenarios, point correspondences that are constructed by off-the-shelf feature matcher such as SIFT, would be corrupted by heavy outliers. In this case, traditional sampling consensus- or motion/geometrical coherence-based methods would suffer a lot from ensuring their assumptions. To this end, we propose a deep technique to better extract underlying geometry information from high-dimensional feature space for two-view geometry estimation. Unlike existing deep methods that use distribution-based normalization or explicitly aggregate neighboring correspondences, we propose a graph attention operation with multi-head mechanism, termed as GANet, to latently capture fine-grain contextual/geometrical relations among these corrupted correspondences. This encourages our network to learn informative representation for ensuring high graph similarity thus focusing more on inliers and restraining outliers. On this basis, our network can more easily infer inliers that are best to recover camera pose. Moreover, we also observe that the calculation of graph similarity for each node is only supported by partial node features. In this regard, we further propose a lightweight implementation for graph attention, namely Sparse GANet, which is performed by learning a sparse attention map based on block-wise operation and Sinkhorn normalization. This sparse strategy can largely reduce the memory and computational requests while maintaining the performance. Extensive experiments of pose estimation, outlier rejection and image registration on different challenging datasets, and combinational tests with different descriptor matchers and robust estimators, demonstrate the superiority and great generalization of our method against the state-of-the-art. In particular, we achieve at least 1.5% and 0.6% mAP(%)@5° enhancement on YFCC and SUN3D data for pose estimation, respectively. And our sparse GANet can reduce the model size to only 0.28 MB and the time cost to 16 ms, which is significant superior than SuperGlue that requires 12.02 MB and 68 ms. (Source code is available at https://github.com/StaRainJ/Code-of-GANet.)

求助该文献

最长约 10秒，即可获得该文献文件

Learning for mismatch removal via graph attention networks

今日热心研友