Machine learning techniques (MLTs) offer great power in analyzing complex data sets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent data sets obtained from three personal exposure measurement campaigns. A correlation-based feature subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes, upholstery materials, space heating, and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorized as low, medium, and high, and for big data sets, both the GLM and MLTs show high variability in performance to correctly classify greater than 90 percentile concentrations, but the MLT models have a higher score when accounting for divergence of incorrec...