The Dark Side of Machine Learning Algorithms

计算机科学 大裂谷 算法 人工智能 机器学习 物理 天文
作者
Mariya I. Vasileva
标识
DOI:10.1145/3394486.3411068
摘要

Machine learning and access to big data are revolutionizing the way many industries operate, providing analytics and automation to many aspects of real-world practical tasks that were previously thought to be necessarily manual. With the pervasiveness of artificial intelligence and machine learning over the past decade, and their epidemic spread in a variety of applications, algorithmic fairness has become a prominent open research problem. For instance, machine learning is used in courts to assess the probability that a defendant recommits a crime; in the medical domain to assist with diagnosis or predict predisposition to certain diseases; in social welfare systems; and autonomous vehicles. The decision making processes in these real-world applications have a direct effect on people's lives, and can cause harm to society if the machine learning algorithms deployed are not designed with considerations to fairness. The ability to collect and analyze large datasets for problems in many domains brings forward the danger of implicit data bias, which could be harmful. Data, especially big data, is often heterogeneous, generated by different subgroups with their owncharacteristics and behaviors. Furthermore, data collection strategies vary vastly across domains, and labelling of examples is performed by human annotators, thus causing the labelling process to amplify inherent biases the annotators might harbor. A model learned on biased data may not only lead to unfair and inaccurate predictions, but also significantly disadvantage certain subgroups, and lead to unfairness in downstream learning tasks. There aremultiple ways in which discriminatory bias can seep into data: for example, in medical domains, there are many instances in whichthe data used are skewed toward certain populations-which canhave dangerous consequences for the underrepresented communities [1]. Another example are large-scale datasets widely used in machine learning tasks, like ImageNet and Open Images: [2] shows that these datasets suffer from representation bias, and advocates for the need to incorporate geo-diversity and inclusion. Yet another example are the popular face recognition and generation datasets like CelebA and Flickr-Faces-HQ, where the ethnic and racial breakdown of example faces shows significant representation bias, evident in downstream tasks like face reconstruction from an obfuscated image [8]. In order to be able to fight discriminatory use of machine learning algorithms that leverage such biases, one needs to first define the notion of algorithmic fairness. Broadly, fairness is the absence of any prejudice or favoritism towards an individual or a group based on their intrinsic or acquired traits in the context of decision making [3]. Fairness definitions fall under three broad types: individual fairness (whereby similar predictions are given to similar individuals [4, 5]), group fairness (whereby different groups are treated equally [4, 5]), and subgroup fairness (whereby a group fairness constraint is being selected, and the task is to determine whether the constraint holds over a large collection of subgroups [6, 7]). In this talk, I will discuss a formal definition of these fairness constraints, examine the ways in which machine learning algorithms can amplify representation bias, and discuss how bias in both the example set and label set of popular datasets has been misused in a discriminatory manner. I will touch upon the issues of ethics and accountability, and present open research directions for tackling algorithmic fairness at the representation level.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
妮妮发布了新的文献求助10
2秒前
kksun发布了新的文献求助10
2秒前
3秒前
5秒前
5秒前
6秒前
6秒前
老实的从菡完成签到,获得积分20
7秒前
CodeCraft应助Augenstern采纳,获得10
7秒前
8秒前
清脆的土豆应助Qiao采纳,获得10
9秒前
小晨完成签到 ,获得积分10
11秒前
15秒前
15秒前
freeok完成签到,获得积分10
16秒前
飞快的孱完成签到,获得积分10
18秒前
朱w发布了新的文献求助10
18秒前
Tink完成签到,获得积分10
18秒前
zy发布了新的文献求助10
19秒前
听寒发布了新的文献求助10
19秒前
Joan.发布了新的文献求助10
20秒前
kksun完成签到,获得积分10
22秒前
dyyy发布了新的文献求助10
22秒前
hyhyhyhy发布了新的文献求助10
22秒前
羊咩咩完成签到 ,获得积分10
22秒前
24秒前
26秒前
28秒前
28秒前
30秒前
开放夜南发布了新的文献求助10
31秒前
32秒前
雪碧没气完成签到,获得积分10
33秒前
废柴胖鱼发布了新的文献求助10
34秒前
妮妮完成签到,获得积分10
35秒前
35秒前
熊猫骑手完成签到 ,获得积分10
35秒前
35秒前
37秒前
高分求助中
Licensing Deals in Pharmaceuticals 2019-2024 3000
Cognitive Paradigms in Knowledge Organisation 2000
Effect of reactor temperature on FCC yield 2000
How Maoism Was Made: Reconstructing China, 1949-1965 800
Introduction to Spectroscopic Ellipsometry of Thin Film Materials Instrumentation, Data Analysis, and Applications 600
Promoting women's entrepreneurship in developing countries: the case of the world's largest women-owned community-based enterprise 500
Shining Light on the Dark Side of Personality 400
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3310041
求助须知:如何正确求助?哪些是违规求助? 2943138
关于积分的说明 8512742
捐赠科研通 2618304
什么是DOI,文献DOI怎么找? 1431024
科研通“疑难数据库(出版商)”最低求助积分说明 664324
邀请新用户注册赠送积分活动 649540