作者
Dmytro Mishkin,Jǐŕı Matas,Michal Perďoch,Karel Lenc
摘要
Generalization of the baseline two-view matching problem WXBS X stands for different subsets of “wide baselines in acquisition conditions. • Novel dataset of ground-truthed image pairs which include multiple wide baselines“ • We show that state-of-the art matchers fail on almost all image pairs. • WxBS-M a novel matching algorithm for the WXBS problem is introduced. We show experimentally that the WXBS-M matcher dominates the state-of-the-art methods both on the new and existing datasets Take away • SIFT family is still the best local descriptor, outperforms novel CNN [SiamNet2015] approaches. • (adaptive) Hessian-Affine is the best detector with broad applicability • Affine view synthesis greatly helps for non-geometrical problems. • Datasets and WxBS-Matcher available http://cmp.felk.cvut.cz/wbs/ • We need more diverse datasets for learning local descriptors than Yosemite and Liberty References WABS – Wide Appearance Baseline Stereo no photometric normalization with photo normalization (mean 0.5, var 0.2) WGBS – Wide Geometry Baseline Stereo WLBS – Wide iLlumination Baseline Stereo WSBS – Wide Sensor Baseline Stereo no photometric normalization with photo normalization (mean 0.5, var 0.2) no photometric normalization with photo normalization (mean 0.5, var 0.2) WGBS summary • SIFT family dominates • Photo-L2 normalized pixel intensities is a strong descriptor • ConvNet [SiamNet15] worse than SIFT (at least when not trained to handle large transformations) • Other descriptor not competitive *Images from Extreme View (EVD) and Oxford-Affine(OxAff) Datasets • SIFT family dominates • ConvNet [SiamNet15] worse than SIFT (at least when not trained to handle illumination transformations) • Other descriptor not competitive WLBS summary • SIFT family dominates • ConvNet [SiamNet15] performs poorly (not trained for photometric distortions) • Other descriptor not competitive WABS summary no photometric normalization with photo normalization (mean 0.5, var 0.2) • No descriptor performance acceptable • Only gradient folding in HalfSIFT works (poorly) • Note the Recall range [0, 0.14] indicating high difficulty WSBS summary Map2Photo: WABS special case with photo normalization (mean 0.5, var 0.2) no photometric normalization • Special (learned?) descriptor is needed for map-photo matching • Note the Recall range [0, 0.06] indicating extreme difficulty of map vs. photo matching *Images from SymBench, GDBootstrap, EgdeFoci (EF) datasets *Images from SymBench, VPRiCE 2015, EgdeFoci (EF) datasets *Images from GDBstrap and MMS datasets *map2ph dataset with this paper • [SiamNet15] S. Zagoruyko, N. Komodakis. Learning to Compare Image Patches via Convolutional Neural Networks. In CVPR 2015 • [HalfSIFT10] J. Chen, J. Tian, N. Lee, J. Zheng, R. Smith, and A. Laine. A partial intensity invariant feature descriptor for multimodal retinal image registration. Biomedical Engineering, IEEE Transactions on, 2010. • [MODS15] D. Mishkin and J. Matas and M. Perdoch. MODS: Fast and Robust Method for Two-View Matching. Accepted to CVIU, 2015. • [DEGENSAC05] O.Chum, T. Werner, J. Matas. Two-view Geometry Estimation Unaffected by a Dominant Plane. In CVPR 2005 5. 1st geom. Inconsistent rule: use for second nearest distance ratio only patches, which are inconsistent with closest one (yellow, not red) 6. Filter duplicates: discard redetections (red patches) HalfSIFT bin SIFT bin 2. Adaptive thresholding: if #HesAffs < θHesAff, lower the detection threshold 3. HalfRootSIFT: 1. Affine view synthesis WxBS-Matcher Input: I1, I2two images, Θmminimum required number of matches, Smaxmaximum number of iterations Output: Fundamental or homography matrix F or H; a list of corresponding local features while Nmatches < Θm and Iter < Smax do for I1and I2separately do 1 Generate synthetic views according to the scale-tilt-rotation-detector setup for Iter 2 Detect local features using adaptive thresholding 3 Extract rotation invariant descriptors with: 3a RootSIFT and 3b HalfRootSIFT 4 Reproject local features to I1, I2 end for 5 Generate tentative correspondences based on 1st geom. Inconsistent rule for RootSIFT and HalfRootSIFT separately using kD-tree 6 Filter duplicates 7 Geometric verification of all TC with modified DEGENSAC estimating F or H 8 Check geometric consistency of the local affine features with est. F or H end while TILDE detector results are post-CR deadline Best results among single detectors (AdHesAf) and view-synth based matchers (WxBS-M) Detector and matcher comparison