Perovskite materials have wide application prospects in many fields due to their tunable and designable band gap characteristics. Machine learning has obvious advantages in quickly and effectively discovering new materials. However, noise interference within data sets frequently hinders the ability of traditional predictive and evaluative techniques to satisfy practical requirements. This study introduces an outlier removal strategy to examine the influence of varying degrees of outlier exclusion on the generalization performance of the learning model followed by the determination of the optimal configuration. The results indicated that the gradient boosting regression tree (GBRT) algorithm yielded a mean absolute error (MAE) of 0.0287, a mean squared error (MSE) of 0.0014, a root mean squared error (RMSE) of 0.0377, and an R-squared (R2) value of 0.979, demonstrating superior performance with a minimal prediction error compared to alternative algorithms. Moreover, the Shapley Additive Explanation (SHAP) method was utilized to elucidate the impact of various chemical compositions on the desired band gap, revealing that the ratio of I exerts the most significant influence, with the Pb, Br, and Sn ratios exerting a subsequent effect. We further investigated the effect of different chemical composition ratios on the band gap, and the experimental results show that individual elements maintain stability within particular proportionate bounds, thereby offering critical data to underpin band gap control strategies. This study provides new valuable insights for realizing accurate prediction and effective control of band gaps.