Most of the existing fusion algorithms are not robust to unregistered input images. Even after image registration, nonlinear nonregistration may persist in the local areas of the images, leading to poor quality in the fused image. So, as to tackle these challenges, a progressive remote sensing image registration and fusion network is proposed in this article, and named PRF-Net, which is particularly useful when two images are from different platforms. First, a registration network is designed to register the input image patches, which includes a global spatial transform network (GSTN) and a local spatial warp network (LSWN). The GSTN is primarily used for coarse registration, applying rigid transformation to globally align the input images. After coarse registration, the preliminarily registered moving image is input into the LSWN for local fine-tuning to maximize correlation between the input image patches. Subsequently, the fine registered images are degraded and input into the fusion network to generate the fused image. To maintain sufficient spectral and spatial information of the fused image, a multiscale feature extraction (MSFE) block with a highly interpretable spatial details attention (SDA) block is designed, which can enhance the ability of fusion network to extract and preserve spatial details and spectral information. Three groups of experiments conducted on four types of remote sensing images give evidence of that the proposed PRF-Net exhibits excellent performance in both reduced and full resolutions, showcasing its outstanding registration and fusion quality.