The Dark Side of Machine Learning Algorithms

计算机科学 大裂谷 算法 人工智能 机器学习 物理 天文
作者
Mariya I. Vasileva
标识
DOI:10.1145/3394486.3411068
摘要

Machine learning and access to big data are revolutionizing the way many industries operate, providing analytics and automation to many aspects of real-world practical tasks that were previously thought to be necessarily manual. With the pervasiveness of artificial intelligence and machine learning over the past decade, and their epidemic spread in a variety of applications, algorithmic fairness has become a prominent open research problem. For instance, machine learning is used in courts to assess the probability that a defendant recommits a crime; in the medical domain to assist with diagnosis or predict predisposition to certain diseases; in social welfare systems; and autonomous vehicles. The decision making processes in these real-world applications have a direct effect on people's lives, and can cause harm to society if the machine learning algorithms deployed are not designed with considerations to fairness. The ability to collect and analyze large datasets for problems in many domains brings forward the danger of implicit data bias, which could be harmful. Data, especially big data, is often heterogeneous, generated by different subgroups with their owncharacteristics and behaviors. Furthermore, data collection strategies vary vastly across domains, and labelling of examples is performed by human annotators, thus causing the labelling process to amplify inherent biases the annotators might harbor. A model learned on biased data may not only lead to unfair and inaccurate predictions, but also significantly disadvantage certain subgroups, and lead to unfairness in downstream learning tasks. There aremultiple ways in which discriminatory bias can seep into data: for example, in medical domains, there are many instances in whichthe data used are skewed toward certain populations-which canhave dangerous consequences for the underrepresented communities [1]. Another example are large-scale datasets widely used in machine learning tasks, like ImageNet and Open Images: [2] shows that these datasets suffer from representation bias, and advocates for the need to incorporate geo-diversity and inclusion. Yet another example are the popular face recognition and generation datasets like CelebA and Flickr-Faces-HQ, where the ethnic and racial breakdown of example faces shows significant representation bias, evident in downstream tasks like face reconstruction from an obfuscated image [8]. In order to be able to fight discriminatory use of machine learning algorithms that leverage such biases, one needs to first define the notion of algorithmic fairness. Broadly, fairness is the absence of any prejudice or favoritism towards an individual or a group based on their intrinsic or acquired traits in the context of decision making [3]. Fairness definitions fall under three broad types: individual fairness (whereby similar predictions are given to similar individuals [4, 5]), group fairness (whereby different groups are treated equally [4, 5]), and subgroup fairness (whereby a group fairness constraint is being selected, and the task is to determine whether the constraint holds over a large collection of subgroups [6, 7]). In this talk, I will discuss a formal definition of these fairness constraints, examine the ways in which machine learning algorithms can amplify representation bias, and discuss how bias in both the example set and label set of popular datasets has been misused in a discriminatory manner. I will touch upon the issues of ethics and accountability, and present open research directions for tackling algorithmic fairness at the representation level.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
如意枫叶发布了新的文献求助10
刚刚
刚刚
张步完成签到 ,获得积分10
刚刚
rayzhanghl完成签到,获得积分10
刚刚
奋斗老鼠发布了新的文献求助10
1秒前
1秒前
子非我发布了新的文献求助10
1秒前
小程同学发布了新的文献求助10
1秒前
ycg发布了新的文献求助20
2秒前
州府十三完成签到,获得积分20
2秒前
Diss完成签到,获得积分10
2秒前
Orange应助科研通管家采纳,获得30
3秒前
4秒前
云舒应助科研通管家采纳,获得30
4秒前
Orange应助科研通管家采纳,获得10
4秒前
慕青应助科研通管家采纳,获得10
4秒前
iNk应助科研通管家采纳,获得20
4秒前
yar应助科研通管家采纳,获得10
4秒前
华仔应助科研通管家采纳,获得10
4秒前
打打应助科研通管家采纳,获得10
4秒前
JamesPei应助科研通管家采纳,获得10
4秒前
musejie应助科研通管家采纳,获得10
4秒前
大模型应助科研通管家采纳,获得10
4秒前
Rylee完成签到,获得积分10
4秒前
iNk应助科研通管家采纳,获得20
4秒前
搜集达人应助科研通管家采纳,获得10
5秒前
凡迪亚比应助科研通管家采纳,获得10
5秒前
香蕉觅云应助科研通管家采纳,获得10
5秒前
5秒前
酷炫翠桃应助qaa2274278941采纳,获得10
5秒前
yar应助科研通管家采纳,获得10
5秒前
星辰大海应助科研通管家采纳,获得30
5秒前
酷波er应助科研通管家采纳,获得10
5秒前
5秒前
田様应助科研通管家采纳,获得10
5秒前
QUA应助科研通管家采纳,获得10
5秒前
完美世界应助科研通管家采纳,获得10
5秒前
yar应助科研通管家采纳,获得10
6秒前
爆米花应助科研通管家采纳,获得10
6秒前
爆米花应助科研通管家采纳,获得10
6秒前
高分求助中
A new approach to the extrapolation of accelerated life test data 1000
‘Unruly’ Children: Historical Fieldnotes and Learning Morality in a Taiwan Village (New Departures in Anthropology) 400
Indomethacinのヒトにおける経皮吸収 400
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370
基于可调谐半导体激光吸收光谱技术泄漏气体检测系统的研究 330
Aktuelle Entwicklungen in der linguistischen Forschung 300
Current Perspectives on Generative SLA - Processing, Influence, and Interfaces 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 3986641
求助须知:如何正确求助?哪些是违规求助? 3529109
关于积分的说明 11243520
捐赠科研通 3267633
什么是DOI,文献DOI怎么找? 1803801
邀请新用户注册赠送积分活动 881207
科研通“疑难数据库(出版商)”最低求助积分说明 808582