Efficient and Privacy-Preserving Feature Importance-based Vertical Federated Learning

计算机科学同态加密差别隐私特征选择数据挖掘特征（语言学）联合学习机器学习信息隐私加密初始化人工智能随机梯度下降算法人工神经网络计算机网络计算机安全语言学哲学程序设计语言

作者

Anran Li,Jiahui Huang,Ju Jia,Hongyi Peng,Lan Zhang,Luu Anh Tuan,Han Yu,Xiang‐Yang Li

出处

期刊：IEEE Transactions on Mobile Computing [Institute of Electrical and Electronics Engineers]
日期：2023-01-01 卷期号：: 1-17 被引量：1

标识

DOI：10.1109/tmc.2023.3333879

摘要

Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners' local features affects the performance of the VFL model, which makes feature selection vitally important. However, existing feature selection methods for VFL either assume the availability of prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected. FedSDG-FS further designs a local embedding perturbation approach to achieve differential privacy for local training data. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. The enhanced version, FedSDG-FS++, protects the privacy for both the clients' training data and the server's labels through Partially Homomorphic Encryption (PHE) without relying on a trusted third-party. Theoretically, we analyze the convergence rate, privacy guarantees and security analysis of our methods. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS and FedSDG-FS++ significantly outperform existing approaches in terms of achieving more accurate selection of high-quality features as well as improving VFL performance in a privacy-preserving manner.

求助该文献

最长约 10秒，即可获得该文献文件

Efficient and Privacy-Preserving Feature Importance-based Vertical Federated Learning

今日热心研友