Advanced Characterization of Industrial Smoke: Particle Composition and Size Analysis with Single Particle Aerosol Mass Spectrometry and Optimized Machine Learning
With the acceleration of industrialization, industrial smoke particles containing complex chemical compositions and varying particle sizes pose a serious threat to the environment and human health. As a powerful tool for aerosol measurement, mass spectrometry can effectively analyze particulate matter. However, due to the high dimensionality and complexity of mass spectrometry data, research on the relationship between particle size and composition remains very limited. To address this gap, this study innovatively combines single particle aerosol mass spectrometry (SPAMS) with optimized machine learning, achieving for the first time the precise prediction of smoke particle size based on mass spectrometry data. Nonlinear dimensionality reduction of mass spectrometry data was performed using kernel principal component analysis (KPCA) to extract key features. Combined with random forest (RF) for prediction, the R2 of the test set reached 0.843 after optimization. Additionally, to address the issue of imbalanced sample distribution, a systematic stratified random sampling algorithm (SSRSA) was developed, significantly enhancing the model's generalization ability and stability during training and testing. This study also simulated a soldering scenario to analyze lead (Pb) isotope abundances and particle size distributions in smoke at different soldering temperatures. Results indicate a significant correlation between the abundance of lead isotopes and the soldering temperature. Additionally, as the soldering temperature increases, the proportion of smaller sized particles increases noticeably. This research provides an innovative approach for precise analysis of industrial smoke particle composition and size, offering critical scientific insights for health risk assessment and the development of pollution control strategies.