Binocular vision is a method that simulates the principle of human vision and uses a computer to passively perceive distance. By obtaining the depth of field information of the object, the actual distance between the object and the camera can be calculated. Based on the improved SURF (Speeded-Up Robust Features) algorithm, this paper implements image feature extraction from different perspectives. The image fusion based on improved Sobel algorithm is used to achieve image fusion and image feature point matching. The triangulation principle is used to calculate the offset between pixels to obtain the three-dimensional information of the object, reconstruct the three-dimensional coordinates, analyze the actual depth, establish the bad point culling rule based on the numerical relationship of the image sequence, and finally use the visual depth information to construct the unstructured 3D (3 dimensions) real-time scene. The experimental results show that for the target in the actual unstructured scene, the average error of 0.1m to 3m is 2.99%; the average error of 3m to 10m is 5.81%, and the system achieves higher measurement accuracy and better 3D reconstruction effect.