Air pollution is one of the most serious environmental problems faced by human beings, and it is also a hot topic in the development of sustainable cities. Accurate PM2.5 prediction plays an important supporting role in urban governance and planning, and government decision-making. Hence, air quality sensing and prediction systems based on artificial intelligence take more and more place in the governance towards sustainable cities. In this paper, we propose a wavelet-packet transform (WPT) driven deep learning model to predict the hourly PM2.5 concentration and verify its effectiveness when applied to Qingdao, China. The wavelet packet is first applied to decompose the meteorological data into sub-time series with different frequencies at different resolutions (STSs-DFDR). Then a multi-dimensional LSTM considering both spatial and temporal information is developed to extract key features from STSs-DFDR to accomplish PM2.5 prediction. As far as we know, this is the first attempt to simultaneously predict PM2.5 concentrations in different regions with a single model. Moreover, we find that the multi-scale analysis of time series is of great help to improve the cross-regional generalization of deep learning models. Finally, experimental results show that the proposed method achieves state-of-the-art PM2.5 prediction performance by comparing it with various methods.