Demographic Representation in 3 Leading Artificial Intelligence Text-to-Image Generators

医学子专业专业人气联想（心理学）代表（政治）人工智能家庭医学人口学心理学社会心理学政治计算机科学社会学政治学法学心理治疗师

作者

Rohaid Ali,Oliver Y. Tang,Ian D. Connolly,Hael Abdulrazeq,Fatima N. Mirza,Rachel Lim,Benjamin R. Johnston,Michael W. Groff,Theresa Williamson,Konstantina Svokos,Tiffany J. Libby,John H. Shin,Ziya L. Gokaslan,Curtis E. Doberstein,James Zou,Wael F. Asaad

出处

期刊：JAMA Surgery [American Medical Association]
日期：2023-11-15 卷期号：159 (1): 87-87 被引量：21

链接

nih.govdoi.org

标识

DOI：10.1001/jamasurg.2023.5695

摘要

Importance The progression of artificial intelligence (AI) text-to-image generators raises concerns of perpetuating societal biases, including profession-based stereotypes. Objective To gauge the demographic accuracy of surgeon representation by 3 prominent AI text-to-image models compared to real-world attending surgeons and trainees. Design, Setting, and Participants The study used a cross-sectional design, assessing the latest release of 3 leading publicly available AI text-to-image generators. Seven independent reviewers categorized AI-produced images. A total of 2400 images were analyzed, generated across 8 surgical specialties within each model. An additional 1200 images were evaluated based on geographic prompts for 3 countries. The study was conducted in May 2023. The 3 AI text-to-image generators were chosen due to their popularity at the time of this study. The measure of demographic characteristics was provided by the Association of American Medical Colleges subspecialty report, which references the American Medical Association master file for physician demographic characteristics across 50 states. Given changing demographic characteristics in trainees compared to attending surgeons, the decision was made to look into both groups separately. Race (non-White, defined as any race other than non-Hispanic White, and White) and gender (female and male) were assessed to evaluate known societal biases. Exposures Images were generated using a prompt template, “a photo of the face of a [blank]”, with the blank replaced by a surgical specialty. Geographic-based prompting was evaluated by specifying the most populous countries on 3 continents (the US, Nigeria, and China). Main Outcomes and Measures The study compared representation of female and non-White surgeons in each model with real demographic data using χ 2 , Fisher exact, and proportion tests. Results There was a significantly higher mean representation of female (35.8% vs 14.7%; P &lt; .001) and non-White (37.4% vs 22.8%; P &lt; .001) surgeons among trainees than attending surgeons. DALL-E 2 reflected attending surgeons’ true demographic data for female surgeons (15.9% vs 14.7%; P = .39) and non-White surgeons (22.6% vs 22.8%; P = .92) but underestimated trainees’ representation for both female (15.9% vs 35.8%; P &lt; .001) and non-White (22.6% vs 37.4%; P &lt; .001) surgeons. In contrast, Midjourney and Stable Diffusion had significantly lower representation of images of female (0% and 1.8%, respectively; P &lt; .001) and non-White (0.5% and 0.6%, respectively; P &lt; .001) surgeons than DALL-E 2 or true demographic data. Geographic-based prompting increased non-White surgeon representation but did not alter female representation for all models in prompts specifying Nigeria and China. Conclusion and Relevance In this study, 2 leading publicly available text-to-image generators amplified societal biases, depicting over 98% surgeons as White and male. While 1 of the models depicted comparable demographic characteristics to real attending surgeons, all 3 models underestimated trainee representation. The study suggests the need for guardrails and robust feedback systems to minimize AI text-to-image generators magnifying stereotypes in professions such as surgery.

求助该文献

最长约 10秒，即可获得该文献文件

Demographic Representation in 3 Leading Artificial Intelligence Text-to-Image Generators

今日热心研友