Graph-Based Deep Learning Models for Thermodynamic Property Prediction: The Interplay between Target Definition, Data Distribution, Featurization, and Model Architecture
In this contribution, we examine the interplay between target definition, data distribution, featurization approaches, and model architectures on graph-based deep learning models for thermodynamic property prediction. Through consideration of five curated data sets, exhibiting diversity in elemental composition, multiplicity, charge state, and size, we examine the impact of each of these factors on model accuracy. We observe that target definition, i.e., using formation instead of atomization energy/enthalpy, is a decisive factor, and so is a careful selection of the featurization approach. Our attempts at directly modifying model architectures result in more modest, though not negligible, accuracy gains. Remarkably, we observe that molecule-level predictions tend to outperform atom-level increment predictions, in contrast to previous findings. Overall, this work paves the way toward the development of robust graph-based thermodynamic model architectures with more universal capabilities, i.e., architectures that can reach excellent accuracy across data sets and compound domains.