With the rapid advent of quantum computing, hybrid quantum-classical machine learning has shown promising computational advantages in many key fields. Quantum reinforcement learning, as one of the most challenging tasks, has recently demonstrated its ability to solve standard benchmark environments with formally provable theoretical advantages over classical counterparts. However, despite the progress of quantum processors and the emergence of quantum computing clouds in the noisy intermediate-scale quantum (NISQ) era, algorithms based on parameterized quantum circuits (PQCs) are rarely conducted on NISQ devices. In this work, we take the first step towards executing benchmark quantum reinforcement problems on various real devices equipped with at most 136 qubits on BAQIS Quafu quantum computing cloud. The experimental results demonstrate that the Reinforcement Learning (RL) agents are capable of achieving goals that are slightly relaxed both during the training and inference stages. Moreover, we meticulously design hardware-efficient PQC architectures in the quantum model using a multi-objective evolutionary algorithm and develop a learning algorithm that is adaptable to Quafu. We hope that the Quafu-RL be a guiding example to show how to realize machine learning task by taking advantage of quantum computers on the quantum cloud platform.