生成语法
计算机科学
心理学
语音识别
人工智能
语言学
自然语言处理
哲学
作者
Sagi Pendzel,Tomer Wullach,Amir Adler,Einat Minkov
出处
期刊:Auerbach Publications eBooks
[Auerbach Publications]
日期:2024-05-23
卷期号:: 54-76
被引量:1
标识
DOI:10.1201/9781032654829-4
摘要
Hate speech refers to the expression of hateful or violent attitudes based on group affiliation such as race, nationality, religion, or sexual orientation. In light of the increasing prevalence of hate speech on social media, there is a pressing need to develop automatic methods that detect hate speech manifestation at scale (Fortuna & Nunes, 2018). Automatic methods of natural language processing in general, and hate speech detection in particular, rely heavily on relevant datasets. While researchers have collected several datasets that contain hate speech samples, those resources are scarce. Furthermore, the difficulty in identifying hate speech on social media has led to the use of biased data sampling techniques, focusing on a specific subset of hateful terms or accounts. Consequently, relevant available datasets are limited in size, highly imbalanced, and exhibit topical and lexical biases. Several recent works have indicated these shortcomings and shown that classification m odels trained on those datasets merely memorize keywords, where this results in poor generalization (Wiegand, et al., 2019; Kennedy et al., 2020).
科研通智能强力驱动
Strongly Powered by AbleSci AI