The paper proposes a new M2ACD(Multi-Actor-Critic Deep Deterministic Policy Gradient) algorithm to apply trajectory planning of the robotic manipulator in complex environments. First, the paper presents a general inverse kinematics algorithm that transforms the inverse kinematics problem into a general Newton-MP iterative method. The M2ACD algorithm based on multiple actors and critics is structured. The dual-actor network reduces the overestimation of action values, minimizes the correlation between the actor and value networks, and mitigates instability during the actor's selection process caused by excessively high Q-values. The dual-critic network reduces the estimation bias of Q-values, ensuring more reliable action selection and enhancing the stability of Q-value estimation. Secondly, The robotic manipulator's TSR (two-stage reward) strategy is designed and divided into the approach and close. Rewards in the approach phase focuses on safely and efficiently approaching the target, and rewards in the close phase involves final adjustments before contact is made with the target. Thirdly, to solve the position hopping jitter problem in traditional reinforcement learning trajectory planning, the NURBS(Non-Uniform Rational B-Splines) curve is used to smooth the hopping trajectory generated by M2ACD. Finally, the correctness of the M2ACD and the kinematics algorithm is verified by experiments. The M2ACD algorithm demonstrated superior curve smoothing, convergence stability and convergence speed compared to the TD3, DARC and DDPG algorithms. The M2ACD algorithm can be effectively applied to collaborative robots' trajectory planning, establishing a foundation for subsequent research.