计算机科学
同态加密
差别隐私
特征选择
数据挖掘
特征(语言学)
联合学习
机器学习
信息隐私
加密
初始化
人工智能
随机梯度下降算法
人工神经网络
计算机网络
计算机安全
语言学
哲学
程序设计语言
作者
Anran Li,Jiahui Huang,Ju Jia,Hongyi Peng,Lan Zhang,Luu Anh Tuan,Han Yu,Xiang‐Yang Li
标识
DOI:10.1109/tmc.2023.3333879
摘要
Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners' local features affects the performance of the VFL model, which makes feature selection vitally important. However, existing feature selection methods for VFL either assume the availability of prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected. FedSDG-FS further designs a local embedding perturbation approach to achieve differential privacy for local training data. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. The enhanced version, FedSDG-FS++, protects the privacy for both the clients' training data and the server's labels through Partially Homomorphic Encryption (PHE) without relying on a trusted third-party. Theoretically, we analyze the convergence rate, privacy guarantees and security analysis of our methods. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS and FedSDG-FS++ significantly outperform existing approaches in terms of achieving more accurate selection of high-quality features as well as improving VFL performance in a privacy-preserving manner.
科研通智能强力驱动
Strongly Powered by AbleSci AI