Under the dual thrust of decarbonisation and digitalisation, data-driven enabling technologies become the most promising solutions to reducing the time, cost, and effort required in the development of modern internal combustion engines (ICEs) in which it is hard to handle high-data-cost, high-dimensional, complex nonlinear modelling problems. This paper proposes a view of data-driven enabling technologies used in ICE soft sensors with a focus on the reduction of experimental effort and model complexity to accelerate the development of ICE decarbonisation. The current progress in data-driven modelling of ICEs is briefly outlined from four aspects: data acquisition methods, data processing methods, machine learning methods and model validation methods. Moreover, the challenges of establishing ICE models with high accuracy, fast response, and strong robustness for real-time control are structured and analysed. Based on the challenges, perspectives on three aspects of versatility, practicality, and autonomy are presented. Finally, physics/data-enhanced machine learning and digital twin technology are suggested to empower soft sensors used for modern ICEs.