Cangnai Fang,Gracia Dianatobing,Talia Atara,Ivan Sebastian Edbert,Derwin Suhartono
标识
DOI:10.1109/icicos56336.2022.9930596
摘要
The high number of depressed people and the fatal effect it can cause raise the urgency to detect a depressed person as soon as possible. Social media as a platform to express oneself can help us do this job. By properly extracting user-created content in social media, we can detect those who are depressed. This paper compares four feature extraction methods to find the best one. The combinations of TF-IDF, LIWC, Word2Vec, and weighted Word2Vec paired with Naïve Bayes or Linear Support Vector Machine classifiers are used on the Reddit Mental Health dataset. Word2Vec paired with the SVM classifier proved to be the best combination with 95.68% accuracy, 92.58% precision, 93.10% recall, and 92.84% F1- score. However, the weighted Word2Vec failed to improve the performance of averaging the basic Word2Vec, obtaining only 95.15% accuracy. Another finding is that SVM performed better than NB for classification though it takes significantly longer to train. This experiment shows that choosing suitable feature extraction methods will benefit the model's performance.