The fact that depression tendency detection is not equivalent to sentiment analysis makes it difficult to find a unified standard for depression tendency detection and to accurately mine depression emotions in image and text. A depression tendency detection model based on fusion of image and text is proposed, which is mainly composed of a textual depression tendency detection model and an image depression tendency detection model. These two models extract text emotion feature through a pre-trained BERT model and use pre-trained VGG19 to learn image emotion feature. Then use a BiGRU-based classifier to obtain the corresponding depression tendency polarity probability. Finally, a model fusion formula is designed in order to play the complementary role of text and image, according to the idea of late fusion. The model combines the text depression tendency detection model and the image depression tendency detection model to perform comprehensive depression tendency detection. The experimental results on the simplified WU3D dataset show that the proposed model has better performance in each index evaluation compared with the single-modal depression tendency detection model.