差别隐私
计算机科学
私人信息检索
背景(考古学)
生成模型
混淆
信息隐私
计算机安全
机器学习
人工智能
数据科学
数据挖掘
生成语法
古生物学
生物
作者
Imdad Ullah,Najmul Hassan,Sukhpal Singh Gill,Basem Suleiman,Tariq Ahamed Ahanger,Zawar Shah,Junaid Qadir,Salil S. Kanhere
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:4
标识
DOI:10.48550/arxiv.2310.12523
摘要
The generative Artificial Intelligence (AI) tools based on Large Language Models (LLMs) use billions of parameters to extensively analyse large datasets and extract critical private information such as, context, specific details, identifying information etc. This have raised serious threats to user privacy and reluctance to use such tools. This article proposes the conceptual model called PrivChatGPT, a privacy-preserving model for LLMs that consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data. To demonstrate its applicability, we show how a private mechanism could be integrated into the existing model for training LLMs to protect user privacy; specifically, we employed differential privacy and private training using Reinforcement Learning (RL). We measure the privacy loss and evaluate the measure of uncertainty or randomness once differential privacy is applied. It further recursively evaluates the level of privacy guarantees and the measure of uncertainty of public database and resources, during each update when new information is added for training purposes. To critically evaluate the use of differential privacy for private LLMs, we hypothetically compared other mechanisms e..g, Blockchain, private information retrieval, randomisation, for various performance measures such as the model performance and accuracy, computational complexity, privacy vs. utility etc. We conclude that differential privacy, randomisation, and obfuscation can impact utility and performance of trained models, conversely, the use of ToR, Blockchain, and PIR may introduce additional computational complexity and high training latency. We believe that the proposed model could be used as a benchmark for proposing privacy preserving LLMs for generative AI tools.
科研通智能强力驱动
Strongly Powered by AbleSci AI