EnML: Multi-label Ensemble Learning for Urdu Text Classification

计算机科学人工智能乌尔都语自然语言处理机器学习水准点（测量）深度学习集成学习哲学语言学大地测量学地理

作者

Faiza Mehmood,Rehab Shahzadi,Hina Ghafoor,Muhammad Nabeel Asim,Muhammad Usman Ghani,Waqar Mahmood,Andreas Dengel

出处

期刊：ACM Transactions on Asian and Low-Resource Language Information Processing 日期：2023-09-22 卷期号：22 (9): 1-31 被引量：2

标识

摘要

Exponential growth of electronic data requires advanced multi-label classification approaches for the development of natural language processing (NLP) applications such as recommendation systems, drug reaction detection, hate speech detection, and opinion recognition/mining. To date, several machine and deep learning–based multi-label classification methodologies have been proposed for English, French, German, Chinese, Arabic, and other developed languages. Urdu is the 11th largest language in the world and has no computer-aided multi-label textual news classification approach. Unlike other languages, Urdu is lacking multi-label text classification datasets that can be used to benchmark the performance of existing machine and deep learning methodologies. With an aim to accelerate and expedite research for the development of Urdu multi-label text classification–based applications, this article provides multiple contributions as follows: First, it provides a manually annotated multi-label textual news classification dataset for the Urdu language. Second, it benchmarks the performance of traditional machine learning approaches particularly by adapting three data transformation approaches along with three top-performing machine learning classifiers and four algorithm adaptation-based approaches. Third, it benchmarks performance of 16 existing deep learning approaches and the four most widely used language models. Finally, it provides an ensemble approach that reaps the benefits of three different deep learning architectures to precisely predict different classes associated with a particular Urdu textual document. Experimental results reveal that proposed ensemble approach performance values (87% accuracy, 92% F1-score, and 8% hamming loss) are significantly higher than adapted machine and deep learning–based approaches.

求助该文献

最长约 10秒，即可获得该文献文件

EnML: Multi-label Ensemble Learning for Urdu Text Classification

今日热心研友