作者
Jonas Degrave,F. Felici,Jonas Buchli,Michael Neunert,Brendan Tracey,F. Carpanese,Timo Ewalds,Roland Hafner,Abbas Abdolmaleki,Diego de Las Casas,Craig Donner,Leslie Fritz,C. Galperti,Andrea Huber,James Keeling,Maria Tsimpoukelli,Jackie Kay,A. Merle,J.M. Moret,Seb Noury,Federico Pesamosca,David Pfau,O. Sauter,C. Sommariva,S. Coda,B.P. Duval,A. Fasoli,Pushmeet Kohli,Koray Kavukcuoglu,Demis Hassabis,Martin Riedmiller
摘要
Abstract Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a promising path towards sustainable energy. A core challenge is to shape and maintain a high-temperature plasma within the tokamak vessel. This requires high-dimensional, high-frequency, closed-loop control using magnetic actuator coils, further complicated by the diverse requirements across a wide range of plasma configurations. In this work, we introduce a previously undescribed architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils. This architecture meets control objectives specified at a high level, at the same time satisfying physical and operational constraints. This approach has unprecedented flexibility and generality in problem specification and yields a notable reduction in design effort to produce new plasma configurations. We successfully produce and control a diverse set of plasma configurations on the Tokamak à Configuration Variable 1,2 , including elongated, conventional shapes, as well as advanced configurations, such as negative triangularity and ‘snowflake’ configurations. Our approach achieves accurate tracking of the location, current and shape for these configurations. We also demonstrate sustained ‘droplets’ on TCV, in which two separate plasmas are maintained simultaneously within the vessel. This represents a notable advance for tokamak feedback control, showing the potential of reinforcement learning to accelerate research in the fusion domain, and is one of the most challenging real-world systems to which reinforcement learning has been applied.