计算机科学
范畴变量
机器学习
样品(材料)
选择(遗传算法)
任务(项目管理)
人工智能
过程(计算)
能见度
趋同(经济学)
数据挖掘
质量(理念)
GSM演进的增强数据速率
数据建模
数据库
哲学
化学
物理
管理
认识论
色谱法
光学
经济
经济增长
操作系统
作者
Anran Li,Lan Zhang,Juntao Tan,Yaxuan Qin,Junhao Wang,Xiangyang Li
标识
DOI:10.1109/infocom42981.2021.9488723
摘要
Federated learning (FL) enables participants to collaboratively construct a global machine learning model without sharing their local training data to the remote server. In FL systems, the selection of training samples has a significant impact on model performances, e.g., selecting participants whose datasets have erroneous samples, skewed categorical distributions, and low content diversity would result in low accuracy and unstable models. In this work, we aim to solve the exigent optimization problem that selects a collection of high-quality training samples for a given FL task under a monetary budget in a privacy-preserving way, which is extremely challenging without visibility to participants' local data and training process. We provide a systematic analysis of important data related factors affecting the model performance and propose a holistic design to privately and efficiently select high-quality data samples considering all these factors. We verify the merits of our proposed solution with extensive experiments on a real AIoT system with 50 clients, including 20 edge computers, 20 laptops, and 10 desktops. The experimental results validates that our solution achieves accurate and efficient selection of high-quality data samples, and consequently an FL model with a faster convergence speed and higher accuracy than that achieved by existing solutions.
科研通智能强力驱动
Strongly Powered by AbleSci AI