Siemen Herremans,Jens de Hoog,Simon Vanneste,Dieter Balemans,Ali Anwar,Siegfried Mercelis,Peter Hellinckx
出处
期刊:Lecture notes in networks and systems日期:2022-10-17卷期号:: 268-277
标识
DOI:10.1007/978-3-031-19945-5_27
摘要
AbstractAutonomous driving does not yet have an industry-standard approach. One of the currently promising approaches is reinforcement learning. A novel model-based deep reinforcement learning algorithm, called MuZero, is able to perform well in observation spaces with a higher complexity than its predecessors. As a step towards autonomous driving, this paper employs MuZero for racing on unseen race tracks based on LIDAR observations. Furthermore, we propose a modification to the algorithm to support a continuous action space. We compare our continuous version of MuZero with its original, discrete variant since autonomous driving is inherently a continuous control problem. We also compare our results with a current benchmark reinforcement learning algorithm: Proximal Policy Optimization (PPO). A solution is proposed and verified to progressively generate race tracks which results in the MuZero agent being able to achieve high rewards on race tracks that it has never seen. Also, the performance of the continuous variant of MuZero is compared to PPO and discrete MuZero. Results show that both PPO and discrete MuZero achieve similar peak performance, while the latter does this with a much higher data-efficiency. Furthermore, we show that continuous MuZero is able to improve its policy, but stagnates at a lower peak performance.