The primary objective of this study is to incorporate the deep reinforcement learning (DRL) technique in conflict detection and resolution (CD&R) control strategies to generate an optimised trajectory for air traffic controllers as reference, in order to improve efficiency and reduce the amount of heading angle change. A DRL environment which can be applied to CD&R agent training is developed. The agent receives the current state of multiple aircrafts in a sector and generates an action to change the heading angle of an aircraft to avoid conflict. A K -Control Actor-Critic algorithm is proposed to limit the number of control times and a two-dimensional continuous action selection policy is utilised. The simulation results show the feasibility of DRL applied in CD&R and there is an obvious advantage in computational efficiency.