EnML: Multi-label Ensemble Learning for Urdu Text Classification

计算机科学 人工智能 乌尔都语 自然语言处理 机器学习 水准点(测量) 深度学习 集成学习 哲学 语言学 大地测量学 地理
作者
Faiza Mehmood,Rehab Shahzadi,Hina Ghafoor,Muhammad Nabeel Asim,Muhammad Usman Ghani,Waqar Mahmood,Andreas Dengel
出处
期刊:ACM Transactions on Asian and Low-Resource Language Information Processing 卷期号:22 (9): 1-31 被引量:2
标识
DOI:10.1145/3616111
摘要

Exponential growth of electronic data requires advanced multi-label classification approaches for the development of natural language processing (NLP) applications such as recommendation systems, drug reaction detection, hate speech detection, and opinion recognition/mining. To date, several machine and deep learning–based multi-label classification methodologies have been proposed for English, French, German, Chinese, Arabic, and other developed languages. Urdu is the 11th largest language in the world and has no computer-aided multi-label textual news classification approach. Unlike other languages, Urdu is lacking multi-label text classification datasets that can be used to benchmark the performance of existing machine and deep learning methodologies. With an aim to accelerate and expedite research for the development of Urdu multi-label text classification–based applications, this article provides multiple contributions as follows: First, it provides a manually annotated multi-label textual news classification dataset for the Urdu language. Second, it benchmarks the performance of traditional machine learning approaches particularly by adapting three data transformation approaches along with three top-performing machine learning classifiers and four algorithm adaptation-based approaches. Third, it benchmarks performance of 16 existing deep learning approaches and the four most widely used language models. Finally, it provides an ensemble approach that reaps the benefits of three different deep learning architectures to precisely predict different classes associated with a particular Urdu textual document. Experimental results reveal that proposed ensemble approach performance values (87% accuracy, 92% F1-score, and 8% hamming loss) are significantly higher than adapted machine and deep learning–based approaches.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
yy完成签到,获得积分10
1秒前
电致阿光发布了新的文献求助10
3秒前
磊枝发布了新的文献求助10
3秒前
予初发布了新的文献求助10
5秒前
在水一方应助皛皛采纳,获得10
9秒前
9秒前
10秒前
yan发布了新的文献求助10
14秒前
仁爱的谷南完成签到,获得积分10
16秒前
予初完成签到,获得积分10
16秒前
三石盟约完成签到,获得积分10
17秒前
小蘑菇应助秋秋采纳,获得10
18秒前
19秒前
璐璐完成签到 ,获得积分10
20秒前
CipherSage应助桀桀桀采纳,获得10
21秒前
优雅的怀莲完成签到,获得积分10
22秒前
25秒前
26秒前
平常书雪发布了新的文献求助10
26秒前
李健应助科研通管家采纳,获得10
27秒前
Owen应助科研通管家采纳,获得10
27秒前
英姑应助科研通管家采纳,获得10
27秒前
李爱国应助搞怪山晴采纳,获得10
28秒前
29秒前
29秒前
科研通AI2S应助甜甜的曼荷采纳,获得10
30秒前
jo发布了新的文献求助10
33秒前
厄页石页发布了新的文献求助10
33秒前
磊枝完成签到,获得积分10
33秒前
34秒前
HEIKU完成签到,获得积分0
34秒前
Pilule发布了新的文献求助10
35秒前
无花果应助goofs采纳,获得10
35秒前
852应助曹帅采纳,获得30
36秒前
hhydeppt完成签到,获得积分10
36秒前
37秒前
yz完成签到,获得积分10
37秒前
狂野梦松发布了新的文献求助10
38秒前
深情安青应助桀桀桀采纳,获得10
38秒前
高分求助中
LNG地下式貯槽指針(JGA指-107) 1000
LNG地上式貯槽指針 (JGA指 ; 108) 1000
QMS18Ed2 | process management. 2nd ed 600
LNG as a marine fuel—Safety and Operational Guidelines - Bunkering 560
How Stories Change Us A Developmental Science of Stories from Fiction and Real Life 500
九经直音韵母研究 500
Full waveform acoustic data processing 500
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 免疫学 细胞生物学 电极
热门帖子
关注 科研通微信公众号,转发送积分 2936095
求助须知:如何正确求助?哪些是违规求助? 2591916
关于积分的说明 6983161
捐赠科研通 2236567
什么是DOI,文献DOI怎么找? 1187844
版权声明 589899
科研通“疑难数据库(出版商)”最低求助积分说明 581434