There has been a growing interest in applying reinforcement learning (RL) to financial trading problems. One particular interesting problem is pairs trading, which is a market-neutral strategy that attempts to profit from temporary price divergences between a pair of historically correlated securities. Traditionally, predetermined thresholds are used to issue trading signals for opening and closing positions. However, it is well documented that the performance of such conventional pairs trading strategies has declined in the last two decades. In this study, we investigate the possibility of using deep reinforcement learning to enhance pairs trading performance. To accelerate and stabilize the learning process, we propose a simple yet effective reward shaping method that takes a baseline policy as input. We show that upon convergence, the learned policy is guaranteed to be at least as good as the baseline. Empirical experiments are conducted on NASDAQ Nordic markets for three training-testing periods using intraday data. The results demonstrate that (i) RL models can achieve higher return and Sharpe ratio than traditional strategies and (ii) the proposed reward shaping method can lead to more efficient and robust trading strategies compared with RL models without reward shaping.