Fine-grained urban flow inference focuses on inferring fine-grained urban flows based solely on coarse-grained observations, which is essential for the city management and transportation services. However, most of the existing methods assume that partial urban flows in coarse-grained regions cannot be observable. In this study, we propose a multi-task framework known as UrbanSTA with space-time attraction learning to estimate missing values in coarse-grained urban flow map and forecast fine-grained urban flows simultaneously. Specifically, UrbanSTA comprises two parts: the flow completion network STA and the fine-grained flow inference network FIN. STA captures space-time features with a separable space-time attention encoder and recovers the missing flow features with a decoder. FIN directly uses complete coarse-grained flow features for further decoding, and reconstructs fine-grained flow features based on the complex associations between coarse- and fine-grained urban flows, relying on upsampling constraints. Extensive experiments conducted on two real-world datasets demonstrate that our proposed model yields the best results compared to other state-of-the-art methods. The source code has been provided at https://github.com/Wangzheaos/UrbanSTA.