Multilingual Fake News Detection in Low-Resource Languages: A Comparative Study Using BERT and GPT-3.5
计算机科学
资源(消歧)
万维网
计算机网络
作者
K. Anirudh,M.K.Madialagan S.R.Srikanth,A. Shahina
出处
期刊:Communications in computer and information science日期:2024-01-01卷期号:: 387-397
标识
DOI:10.1007/978-3-031-58495-4_28
摘要
This paper presents a novel attempt at evaluating the authenticity of Tamil news headlines using large language models (LLMs) and evaluating it besides transformer models and existing machine learning results. To tackle this classification task, two potent models—the transformer-based BERT and the LLM, gpt-3.5-turbo—are deployed and fine-tuned to distinguish genuine from fabricated news headlines. Through careful fine-tuning and training of BERT, m-BERT, and GPT-3.5-Turbo, we assess their effectiveness, contrasting a bi-directional transformer with a generative transformer for fake news classification. Careful selection leads us to training based on three types of inputs: (1) Tamil news with English translations and author information; (2) Tamil news with author information only; and (3) English news with author information only. Our evaluation yields intriguing insights, showing that models trained on inputs with English versions consistently outperform those relying solely on Tamil text. Performance metrics, including accuracy, precision, recall, and F1-score, imply the superiority of the LLM -based gpt-3.5-turbo, achieving an accuracy of 0.92, precision of 0.902, recall of 0.949, and F1-score of 0.925. This highlights the effectiveness of LLMs in Tamil fake news classification. Moreover, these findings stress the significance of multilingual data processing for bolstering the accuracy of news headline classification systems. They also provide valuable insights for enhancing the reliability and precision of fake news detection systems in multilingual environments.