To accurately acquire the peak tire–road friction coefficient, a fusion estimation framework combining vision and vehicle dynamic information is established. First, information for the road ahead is collected in advance from an image captured by a camera, and the road type with its typical range of tire–road friction coefficients is identified with a lightweight convolutional neural network. Then, an unscented Kalman filter (UKF) method is established to estimate the tire–road friction coefficient value directly according to the dynamic vehicle states. Next, the results from the road-type recognition and dynamic estimation methods are spatiotemporally synchronized. Finally, a confidence-based vision and vehicle dynamic fusion strategy is proposed to obtain an accurate peak tire–road friction coefficient. The virtual and real vehicle test results suggest that the proposed fusion estimation strategy can accurately determine the peak tire–road friction coefficient. The proposed strategy can more precisely acquire the tire–road friction coefficient than can the general vision-based estimation method and is superior to the dynamic-based estimation method in that it eliminates the need for sufficient tire excitation to some extent.