Abstract A lot of research is undergoing in Underwater as it has huge applications. An underwater network is a delay-tolerant network [1][2] due to its intermittent characteristics. Underwater acoustic communication enables communication undersea. Wireless sensor nodes underwater are sparsely placed due to environmental characteristics [3] to gather information. Communication undersea is tedious because of noise and varying environments. Since the underwater environment is highly unpredictable due to its nature, there doesn’t exist a constant path or route between wireless sensor nodes. And the battery of sensor nodes is a major concern as they cannot be replaced frequently. Therefore, it's necessary to design an algorithm that can establish a path to the destination dynamically based on the environmental conditions and the node’s battery level. In this paper, the authors have proposed a Reinforcement Learning approach to evaluate sensor nodes’ performance. Many machine learning algorithms have used only the epsilon greedy action selection method. But here, four different types of action selection methods are used for the routing purpose. Based on the threshold level, an appropriate action selection method is chosen. The validation of the proposed approach is carried out by comparing the RL algorithm with other baseline algorithms. Experimental results showcase RL algorithm outperforms other baseline algorithms.