Intelligent reflecting surface (IRS)-assisted multiple-input multiple-output (MIMO) systems are foreseen as key enablers of beyond 5G (B5G) and 6G wireless communications. By properly designing the MIMO precoding matrices and the IRS phase-shift matrix, the system performance significantly improves in terms of higher transmission rates, lower power consumption and delays, and improved communication security. To overcome the high dimensionality of the joint optimization of the precoders and the IRS phase shift matrix, we propose an innovative deep reinforcement learning (DRL)-based approach. We aim at maximizing the system sum-rate by considering an adaptation of the deep deterministic policy gradient (DDPG) framework, namely twin delayed DDPG (TD3). Hence, the optimization problem is formulated in terms of continuous action and state spaces, while artificial neural networks (ANNs) are used for the function approximations. The simulation results show that the proposed solution reaches a competitive performance when compared with other state-of-the-art algorithms.