Recently 3D reconstruction has received extensive attention and deep research due to the continuous development of the field of computer vision. Nonetheless, the learning-based methods for 3D reconstruction are not robust enough, resulting in the reconstruction model having many occlusions and outliers. In this study, we aim to improve feature matching correlation, aggregate global contextual information, and enhance the robustness of depth estimation to improve the quality of the reconstruction. We propose an attention-based deep sparse priori cascade multi-view stereo network, ADS-MVSNet. Firstly, we propose a feature extraction module based on the attention mechanism to obtain the regions of interest in the input scene. Secondly, we propose a depth sparse prior strategy module to estimate the depth map of the input scene more accurately. It is followed by refinement of the initial depth map using a coarse-to-fine method to improve the accuracy of point cloud reconstruction. These two modules are lightweight and effective, improving the robustness of depth map estimation. We conduct experiments on three common datasets (DTU, ETH3D, Tanks & Temples) and a dataset of a real scene created by us. The experimental results show that ADS-MVSNet performs better in reconstruction quality compared to classical methods.