As global populations grow and environmental constraints intensify, improving agricultural water management is essential for sustainable food production. Traditional irrigation methods often lack adaptability, leading to inefficient water use. Reinforcement learning (RL) offers a promising solution for developing dynamic irrigation strategies that balance productivity and resource conservation. However, agricultural RL tasks are characterized by sparse actions—irrigation only when necessary—and delayed rewards realized at the end of the growing season. This study integrates RL with AquaCrop-OSPy simulations in the Gymnasium framework to develop adaptive irrigation policies for maize. We introduce a reward mechanism that penalizes incremental water usage while rewarding end-of-season yields, encouraging resource-efficient decisions. Using the Proximal Policy Optimization (PPO) algorithm, our RL-driven approach outperforms fixed-threshold irrigation strategies, reducing water use by 29% and increasing profitability by 9%. It achieves a water use efficiency of 76.76 kg/ha/mm, a 40% improvement over optimized soil moisture threshold methods. These findings highlight RL’s potential to address the challenges of sparse actions and delayed rewards in agricultural management, delivering significant environmental and economic benefits.