期刊:IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers] 日期:2021-01-29卷期号:: 1-13被引量:2
标识
DOI:10.1109/tgrs.2021.3053062
摘要
Capturing accurate multiscale semantic information from the images is of great importance for high-quality semantic segmentation. Over the past years, a large number of methods attempt to improve the multiscale information capturing ability of the networks via various means. However, these methods always suffer unsatisfactory efficiency (e.g., speed or accuracy) on the images that include a large number of small-scale objects, for example, aerial images. In this article, we propose a new network named cross fusion net (CF-Net) for fast and effective extraction of the multiscale semantic information, especially for small-scale semantic information. In particular, the proposed CF-Net can capture more accurate small-scale semantic information from two aspects. On the one hand, we develop a channel attention refinement block to select the informative features. On the other hand, we propose a cross fusion block to enlarge the receptive field of the low-level feature maps. As a result, the network can encode more accurate semantic information from the small-scale objects, and the segmentation accuracy of the small-scale objects is improved accordingly. We have compared the proposed CF-Net with several state-of-the-art semantic segmentation methods on two popular aerial image segmentation data sets. Experimental results reveal that the average F₁ score gain brought by our CF-Net is about 0.43% and the F₁ score gain of the small-scale objects (e.g., cars) is about 2.61%. In addition, our CF-Net has the fastest inference speed, which proves its superiority in the aerial scenes. Our code will be released at: https://github.com/pcl111/CF-Net.