稳健性(进化)
计算机科学
Boosting(机器学习)
噪声数据
标记数据
一致性(知识库)
人工智能
机器学习
数据建模
数据挖掘
数据一致性
合成数据
训练集
模式识别(心理学)
数据库
生物化学
基因
化学
作者
Jing-Ming Guo,Chi‐Chia Sun,Kuan-Yu Chan,Chun-Yu Liu
出处
期刊:IEEE Transactions on Consumer Electronics
[Institute of Electrical and Electronics Engineers]
日期:2023-11-20
卷期号:: 1-1
标识
DOI:10.1109/tce.2023.3331700
摘要
In this paper, we address the problem of noisy datasets by proposing a dual screening scheme to improve the performance of models trained on two public noisy datasets: Clothing1M and Animal-10N. As web crawlers generate both datasets, their label error levels cannot be estimated. We use a warm-up model to separate the data into labeled and unlabeled data, which are then classified by multi-model consistency. We select consistent data from the dataset and provide pseudo-labels for training, while the remaining data is not trained as noisy data. This approach reduces the impact of noisy data and mislabeling. To improve the model’s robustness, we combine clean data and unlabeled data with strong data augmentation and train them using the Mixup algorithm. Experimental results show that our proposed methods boost classification performance: the accuracy of Clothing1M is 0.1% higher than the state-of-the-art method, and the accuracy of Animal-10N is 2% higher than the state-of-the-art method. The significant contributions of this paper are: 1) adding strong data augmentation to enhance the model, 2) using multi-consistency to reduce the impact of noisy data, and 3) boosting performance through semi-supervised learning.
科研通智能强力驱动
Strongly Powered by AbleSci AI