Quantum thermodynamic relationships in emerging nanodevices are significant but often complex to deal with. The application of machine learning in quantum thermodynamics has provided a new perspective. This study employs reinforcement learning to output the optimal cycle of a quantum heat engine. Specifically, the soft actor-critic algorithm is adopted to optimize the cycle of a three-level coherent quantum heat engine with the aim of maximal average power. The results show that the optimal average output power of the coherent three-level heat engine is 1.28 times greater than the original cycle (steady limit). Meanwhile, the efficiency of the optimal cycle is greater than the Curzon-Ahlborn efficiency as well as efficiencies reported by other researchers. Notably, this optimal cycle can be fitted as an Otto-like cycle, which illustrates the effectiveness of the method.