人工智能
计算机科学
离群值
模式识别(心理学)
规范化(社会学)
姿势
图形
计算机视觉
尺度不变特征变换
特征提取
理论计算机科学
人类学
社会学
作者
Xingyu Jiang,Yang Wang,Aoxiang Fan,Jiayi Ma
出处
期刊:Isprs Journal of Photogrammetry and Remote Sensing
日期:2022-08-01
卷期号:190: 181-195
被引量:12
标识
DOI:10.1016/j.isprsjprs.2022.06.009
摘要
Recovering camera pose from two-view images is a critical problem in photogrammetry and computer vision. For complex scenarios, point correspondences that are constructed by off-the-shelf feature matcher such as SIFT, would be corrupted by heavy outliers. In this case, traditional sampling consensus- or motion/geometrical coherence-based methods would suffer a lot from ensuring their assumptions. To this end, we propose a deep technique to better extract underlying geometry information from high-dimensional feature space for two-view geometry estimation. Unlike existing deep methods that use distribution-based normalization or explicitly aggregate neighboring correspondences, we propose a graph attention operation with multi-head mechanism, termed as GANet, to latently capture fine-grain contextual/geometrical relations among these corrupted correspondences. This encourages our network to learn informative representation for ensuring high graph similarity thus focusing more on inliers and restraining outliers. On this basis, our network can more easily infer inliers that are best to recover camera pose. Moreover, we also observe that the calculation of graph similarity for each node is only supported by partial node features. In this regard, we further propose a lightweight implementation for graph attention, namely Sparse GANet, which is performed by learning a sparse attention map based on block-wise operation and Sinkhorn normalization. This sparse strategy can largely reduce the memory and computational requests while maintaining the performance. Extensive experiments of pose estimation, outlier rejection and image registration on different challenging datasets, and combinational tests with different descriptor matchers and robust estimators, demonstrate the superiority and great generalization of our method against the state-of-the-art. In particular, we achieve at least 1.5% and 0.6% mAP(%)@5° enhancement on YFCC and SUN3D data for pose estimation, respectively. And our sparse GANet can reduce the model size to only 0.28 MB and the time cost to 16 ms, which is significant superior than SuperGlue that requires 12.02 MB and 68 ms. (Source code is available at https://github.com/StaRainJ/Code-of-GANet.)
科研通智能强力驱动
Strongly Powered by AbleSci AI