Abstract The automatical recognition of human facial expression has attracted attention in the field of computer vision and machine learning. Previous works on this topic set many constraints, such as the impact caused by restricted scenarios and low image quality. To address those problems, we propose a new infrared facial expression recognition method with multi-label distribution learning for understanding non-verbal behaviors in the classroom. Specifically, we first compute the feature similarities of seven basic facial expressions to describe the relationship among the adjacent expression images. Then, the similarity values are fitted by a Cauchy distribution function. Furthermore, we construct a new deep network with Cauchy distribution-based label learning (CDLLNet), instead of the conventional single expression labels. By these revised labels, one infrared facial expression can contribute to the learning of neighboring expression labels, as well as its real expression label. The performance of proposed network is evaluated on two facial expression datasets: Oulu-CASIA and CK+. Several qualitative and quantitative experimental results verify that the CDLLNet network can achieve robust results and significantly outperforms the existing state-of-the-art facial expression algorithms.