潜在Dirichlet分配
计算机科学
人工智能
社会化媒体
深度学习
特征学习
机器学习
特征(语言学)
主题模型
学习排名
标记数据
自编码
特征工程
代表(政治)
背景(考古学)
秩(图论)
编码器
一般化
排名(信息检索)
万维网
古生物学
哲学
政治学
法学
数学分析
组合数学
操作系统
政治
生物
语言学
数学
作者
Junaid Abdul Wahid,Lei Shi,Yufei Gao,Bei Yang,Lin Wei,Yongcai Tao,Shabir Hussain,Muhammad Ayoub,Imam Yagoub
标识
DOI:10.1016/j.eswa.2022.116562
摘要
The abundant use of social media impacts every aspect of life, including crisis management. Disaster management needs real-time data to be used in machine learning and deep learning models to aid their decision making. Mostly the data that is newly generated from social media is unstructured and unlabeled. Current text classification models based on supervised deep learning models heavily rely on human-labeled data that very small size and imbalanced in the context of disasters, ultimately affecting the generalization of models. In this study, we propose Topic2labels (T2L) framework which provides an automated way of labeling the data through LDA (latent dirichlet allocation) topic modeling approach and utilize Bert (the bidirectional encoder representation from transformer) embeddings for construction of feature vector to be employed to classify the data contextually. Our framework consists of three layers. In the first layer, we adopt LDA to generate the topics from the data, and develop a new algorithm to rank the topics, and map the highest ranked dominant topic into label to annotate the data. In the second layer, we transform the labeled text into feature representation through Bert embeddings and in the third layer we leveraged deep learning models as classifiers to classify the textual data into multiple categories. Experimental results on crisis-related datasets show that our framework performs better in terms of classification performance and yields improvement as compared to other baseline approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI