Accurate timely estimation of emissions of nitrogen oxides (NOx) is a prerequisite for designing an effective strategy for reducing O3 and PM2.5 pollution. The satellite-based top-down method can provide near-real-time constraints on emissions; however, its efficiency is largely limited by efforts in dealing with the complex emission–concentration response. Here, we propose a novel machine-learning-based method using a physically informed variational autoencoder (VAE) emission predictor to infer NOx emissions from satellite-retrieved surface NO2 concentrations. The computational burden can be significantly reduced with the help of a neural network trained with a chemical transport model, allowing the VAE emission predictor to provide a timely estimation of posterior emissions based on the satellite-retrieved surface NO2 concentration. The VAE emission predictor successfully corrected the underestimation of NOx emissions in rural areas and the overestimation in urban areas, resulting in smaller normalized mean biases (reduced from −0.8 to −0.4) and larger R2 values (increased from 0.4 to 0.7). The interpretability of the VAE emission predictor was investigated using sensitivity analysis by modulating each feature, indicating that NO2 concentration and planetary boundary layer (PBL) height are important for estimating NOx emissions, which is consistent with our common knowledge. The advantages of the VAE emission predictor in efficiency, flexibility, and accuracy demonstrate its great potential in estimating the latest emissions and evaluating the control effectiveness from observations.