计算机科学
谣言
杠杆(统计)
人工智能
机器学习
图形
标记数据
半监督学习
监督学习
训练集
模式识别(心理学)
人工神经网络
公共关系
理论计算机科学
政治学
作者
Yuhan Qiao,Chaoqun Cui,Yiying Wang,Caiyan Jia
标识
DOI:10.1016/j.neucom.2024.128314
摘要
Existing rumor detection models have achieved remarkable performance in fully-supervised settings. However, it is time-consuming and labor-intensive to obtain extensive labeled rumor data. To mitigate the reliance on labeled data, semi-supervised learning (SSL), jointly learning from labeled and unlabeled samples, achieves significant performance improvements at low costs. Commonly used self-training methods in SSL, despite their simplicity and efficiency, suffer from the notorious confirmation bias, which can be seen as the accumulation of noise arising from utilization of incorrect pseudo-labels. To deal with the problem, in this study, we propose a debiased self-training framework with graph self-supervised pre-training for semi-supervised rumor detection. First, to enhance the initial model for self-training and reduce the generation of incorrect pseudo-labels in early stages, we leverage the rumor propagation structures of massive unlabeled data for graph self-supervised pre-training. Second, we improve the quality of pseudo-labels by proposing a pseudo-labeling strategy with self-adaptive thresholds, which consists of self-paced global thresholds controlling the overall utilization process of pseudo-labels and local class-specific thresholds attending to the learning status of each class. Extensive experiments on four public benchmarks demonstrate that our method significantly outperforms previous rumor detection baselines in semi-supervised settings, especially when labeled samples are extremely scarce. Notably, we have achieved 96.3% accuracy on Weibo with 500 labels per class and 86.0% accuracy with just 5 labels per class.
科研通智能强力驱动
Strongly Powered by AbleSci AI