Autofocus plays an important role in microscopic imaging. As an extension of image-based methods, learning-based methods make real-time autofocus possible. The recently proposed learning-based autofocus methods achieved promising results in estimating defocus distance. However, the focusing accuracy depends partly on the feature extraction ability of the network model, and what features are specifically extracted by the network contributed to its success remains a mystery. In this paper, a single-shot microscopic autofocus method was proposed, which predicts the defocus distance from a single natural image, to improve the model's ability to extract image detail features. Furthermore, we validate that the neural network model mainly predicts the defocus distance by focusing on the sharpness of texture and edge features, and visualize the weight of the predicting results. A realistic dataset of sufficient size was made to train all models. The experiment shows the proposed network model has better focusing accuracy compared with other models, with a mean focusing error of 0.44μm, and pays more attention to the texture and edge features.