An abusive text detection system based on enhanced abusive and non-abusive word lists

计算机科学 文字2vec 人工智能 自然语言处理 虐待关系 词(群论) 余弦相似度 俚语 社会化媒体 机器学习 计算机安全 万维网 毒物控制 聚类分析 家庭暴力 语言学 伤害预防 哲学 环境卫生 医学 嵌入
作者
Ho Suk Lee,Hong Rae Lee,Jun U. Park,Yo-Sub Han
出处
期刊:Decision Support Systems [Elsevier BV]
卷期号:113: 22-31 被引量:42
标识
DOI:10.1016/j.dss.2018.06.009
摘要

Abusive text (indiscriminate slang, abusive language, and profanity) on the Internet is not just a message but rather a tool for very serious and brutal cyber violence. It has become an important problem to devise a method for detecting and preventing abusive text online. However, the intentional obfuscation of words and phrases makes this task very difficult and challenging. We design a decision system that successfully detects (obfuscated) abusive text using an unsupervised learning of abusive words based on word2vec's skip-gram and the cosine similarity. The system also deploys several efficient gadgets for filtering abusive text such as blacklists, n-grams, edit-distance metrics, mixed languages, abbreviations, punctuation, and words with special characters to detect the intentional obfuscation of abusive words. We integrate both an unsupervised learning method and efficient gadgets into a single system that enhances abusive and non-abusive word lists. The integrated decision system based on the enhanced word lists shows a precision of 94.08%, a recall of 80.79%, and an f-score of 86.93% in malicious word detection for news article comments, a precision of 89.97%, a recall of 80.55%, and an f-score 85.00% for online community comments, and a precision of 90.65%, a recall of 93.57%, and an f-score 92.09% for Twitter tweets. We expect that our approach can help to improve the current abusive word detection system, which is crucial for several web-based services including social networking services and online games.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
anna发布了新的文献求助10
刚刚
YWang发布了新的文献求助10
3秒前
3秒前
NiNi完成签到,获得积分10
5秒前
悦耳寒松发布了新的文献求助10
5秒前
sijing发布了新的文献求助10
6秒前
7777完成签到,获得积分10
6秒前
求求了,让孩子毕业吧完成签到,获得积分10
7秒前
7秒前
10秒前
我是老大应助LiuJinhui采纳,获得10
11秒前
14秒前
量子星尘发布了新的文献求助10
14秒前
清久完成签到,获得积分10
15秒前
牛马码字员完成签到,获得积分10
15秒前
橙果果发布了新的文献求助20
16秒前
所所应助11采纳,获得10
16秒前
tt大耳朵完成签到,获得积分10
17秒前
17秒前
18秒前
枫之林发布了新的文献求助10
18秒前
辛俊辰发布了新的文献求助10
18秒前
xiao完成签到 ,获得积分10
18秒前
lemongulf完成签到 ,获得积分10
19秒前
发表多篇高ifsci的第一作者完成签到,获得积分20
20秒前
阅遍SCI完成签到,获得积分10
20秒前
21秒前
飞鱼z完成签到,获得积分10
22秒前
LiuJinhui发布了新的文献求助10
22秒前
Infinit发布了新的文献求助10
24秒前
24秒前
24秒前
悦耳寒松完成签到,获得积分10
25秒前
chun完成签到,获得积分10
27秒前
干煸鸡完成签到,获得积分10
30秒前
30秒前
可爱的函函应助ah爱科研采纳,获得10
31秒前
33秒前
CAOHOU应助Infinit采纳,获得10
33秒前
精美礼物完成签到,获得积分10
36秒前
高分求助中
A new approach to the extrapolation of accelerated life test data 1000
ACSM’s Guidelines for Exercise Testing and Prescription, 12th edition 500
‘Unruly’ Children: Historical Fieldnotes and Learning Morality in a Taiwan Village (New Departures in Anthropology) 400
Indomethacinのヒトにおける経皮吸収 400
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370
基于可调谐半导体激光吸收光谱技术泄漏气体检测系统的研究 350
Robot-supported joining of reinforcement textiles with one-sided sewing heads 320
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 3989069
求助须知:如何正确求助?哪些是违规求助? 3531351
关于积分的说明 11253589
捐赠科研通 3269939
什么是DOI,文献DOI怎么找? 1804851
邀请新用户注册赠送积分活动 882074
科研通“疑难数据库(出版商)”最低求助积分说明 809073