Vision sensors have the advantages of low cost and low structural complexity. However, in certain challenging scenarios, such as high dynamic range and high-speed motion, tracking failures and inaccurate positioning may occur. While a single sensor excels in specific aspects of SLAM technology, the integration of multiple sources of information has become crucial in the field of vision-based SLAM. To address the limitations of pure visual sensors in the SLAM system, a visual-inertial fusion strategy based on multi-sensor fusion is proposed. By incorporating inertial navigation units, the SLAM system gains absolute scale information, and establishing temporal alignment between camera frames enhances calculations. This approach improves the accuracy and robustness of the mobile robot's positioning and mapping system, enabling the complementary utilization of angular velocity and acceleration data from the IMU and visual camera sensor. Experimental results demonstrate significant enhancements compared to the ORBSLAM2 algorithm, with absolute errors reduced by approximately 25% to 60% and relative errors improved by 70% to 90%. Moreover, the proposed method achieves higher accuracy in calculating attitude changes between adjacent frames.