计算机科学
分类器(UML)
噪声数据
稳健性(进化)
交通分类
人工智能
数据挖掘
训练集
机器学习
模式识别(心理学)
计算机网络
服务质量
生物化学
基因
化学
作者
Siping Shi,Yingya Guo,Dan Wang,Yifei Zhu,Zhu Han
标识
DOI:10.1109/tmc.2023.3319657
摘要
Network traffic classifiers of mobile devices are widely learned with federated learning(FL) for privacy preservation. Noisy labels commonly occur in each device and deteriorate the accuracy of the learned network traffic classifier. Existing noise elimination approaches attempt to solve this by detecting and removing noisy labeled data before training. However, they may lead to poor performance of the learned classifier, as the remaining traffic data in each device is few after noise removal. Motivated by the observation that the data feature of the noisy labeled traffic data is clean and the underlying true distribution of the noisy labeled data is statistically close to the clean traffic data, we propose to utilize the noisy labeled data by normalizing it to be close to the clean traffic data distribution. Specifically, we first formulate a distributionally robust federated network traffic classifier learning problem (DR-NTC) to jointly take the normalized traffic data and clean data into training. Then we specify the normalization function under Wasserstein distance to transform the noisy labeled traffic data into a certified robust region around the clean data distribution, and we reformulate the DR-NTC problem into an equivalent DR-NTC-W problem. Finally, we design a robust federated network traffic classifier learning algorithm, RFNTC, to solve the DR-NTC-W problem. Theoretical analysis shows the robustness guarantee of RFNTC. We evaluate the algorithm by training classifiers on a real-world dataset. Our experimental results show that RFNTC significantly improves the accuracy of the learned classifier by up to 1.05 times.
科研通智能强力驱动
Strongly Powered by AbleSci AI