Continuous high-frequency wood drying, when integrated with a traditional wood finishing line, allows correcting moisture content one piece of lumber at a time in order to improve its value. However, the integration of this precision drying process complicates sawmills logistics. The high stochasticity of lumber properties and less than ideal lumber routing decisions may cause bottlenecks and reduces productivity. To counteract this problem and fully exploit the technology, we propose to use reinforcement learning (RL) for learning continuous drying operation policies. An RL agent interacts with a simulated model of the finishing line to optimize its policies. Our results, based on multiple simulations, show that the learned policies outperform the heuristic currently used in industry and are robust to sudden disturbances which frequently occur in real contexts.