众包
噪音(视频)
计算机科学
降噪
人工智能
机器学习
还原(数学)
噪声测量
度量(数据仓库)
模式识别(心理学)
自然语言处理
数据挖掘
数学
几何学
万维网
图像(数学)
作者
H. Chen,Yueheng Sun,Meishan Zhang,Min Zhang
出处
期刊:IEEE/ACM transactions on audio, speech, and language processing
[Institute of Electrical and Electronics Engineers]
日期:2023-10-16
卷期号:32: 139-150
标识
DOI:10.1109/taslp.2023.3325135
摘要
Label noise is an important issue in machine learning, which might lead to negative influences on various tasks. Given that real benchmarks for evaluation of noise reduction methods are limited, plenty of studies construct pseudo noisy data to verify their proposed methods. However, very few works have realized the rationality of the noise generation strategies. If the generated pseudo datasets are biased, their final conclusions might also be problematic. In this work, we focus on text classification of natural language processing (NLP) to investigate various pseudo noise generation methods, which is the first work of this line for NLP. In particular, we compare the noise generated with crowdsourcing noise, a kind of real noise as gold-standard, to evaluate these noise generation methods. After then, we measure and compare the performance of representative noise reduction methods respectively based on the data of crowdsourcing and our top-ranked pseudo noisy generation strategies. We conduct experiments on five text classification datasets, offering detailed comparison results as well as discussions.
科研通智能强力驱动
Strongly Powered by AbleSci AI