计算机科学
图像(数学)
利用
光学(聚焦)
构造(python库)
人工智能
图像编辑
计算机安全
光学
物理
程序设计语言
作者
Yiting Qu,Xinyue Shen,Xinlei He,Michael Backes,Savvas Zannettou,Yang Zhang
标识
DOI:10.1145/3576915.3616679
摘要
State-of-the-art Text-to-Image models like Stable Diffusion and DALLE\cdot2 are revolutionizing how people generate visual content. At the same time, society has serious concerns about how adversaries can exploit such models to generate problematic or unsafe images. In this work, we focus on demystifying the generation of unsafe images and hateful memes from Text-to-Image models. We first construct a typology of unsafe images consisting of five categories (sexually explicit, violent, disturbing, hateful, and political). Then, we assess the proportion of unsafe images generated by four advanced Text-to-Image models using four prompt datasets. We find that Text-to-Image models can generate a substantial percentage of unsafe images; across four models and four prompt datasets, 14.56% of all generated images are unsafe. When comparing the four Text-to-Image models, we find different risk levels, with Stable Diffusion being the most prone to generating unsafe content (18.92% of all generated images are unsafe). Given Stable Diffusion's tendency to generate more unsafe content, we evaluate its potential to generate hateful meme variants if exploited by an adversary to attack a specific individual or community. We employ three image editing methods, DreamBooth, Textual Inversion, and SDEdit, which are supported by Stable Diffusion to generate variants. Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world. Overall, our results demonstrate that the danger of large-scale generation of unsafe images is imminent. We discuss several mitigating measures, such as curating training data, regulating prompts, and implementing safety filters, and encourage better safeguard tools to be developed to prevent unsafe generation.1 Our code is available at https://github.com/YitingQu/unsafe-diffusion.
科研通智能强力驱动
Strongly Powered by AbleSci AI