计算机科学                        
                
                                
                        
                            文字2vec                        
                
                                
                        
                            人工智能                        
                
                                
                        
                            自然语言处理                        
                
                                
                        
                            虐待关系                        
                
                                
                        
                            词(群论)                        
                
                                
                        
                            余弦相似度                        
                
                                
                        
                            俚语                        
                
                                
                        
                            社会化媒体                        
                
                                
                        
                            机器学习                        
                
                                
                        
                            计算机安全                        
                
                                
                        
                            万维网                        
                
                                
                        
                            毒物控制                        
                
                                
                        
                            聚类分析                        
                
                                
                        
                            家庭暴力                        
                
                                
                        
                            语言学                        
                
                                
                        
                            伤害预防                        
                
                                
                        
                            哲学                        
                
                                
                        
                            环境卫生                        
                
                                
                        
                            医学                        
                
                                
                        
                            嵌入                        
                
                        
                    
            作者
            
                Ho Suk Lee,Hong Rae Lee,Jun U. Park,Yo-Sub Han            
         
                    
        
    
            
            标识
            
                                    DOI:10.1016/j.dss.2018.06.009
                                    
                                
                                 
         
        
                
            摘要
            
            Abusive text (indiscriminate slang, abusive language, and profanity) on the Internet is not just a message but rather a tool for very serious and brutal cyber violence. It has become an important problem to devise a method for detecting and preventing abusive text online. However, the intentional obfuscation of words and phrases makes this task very difficult and challenging. We design a decision system that successfully detects (obfuscated) abusive text using an unsupervised learning of abusive words based on word2vec's skip-gram and the cosine similarity. The system also deploys several efficient gadgets for filtering abusive text such as blacklists, n-grams, edit-distance metrics, mixed languages, abbreviations, punctuation, and words with special characters to detect the intentional obfuscation of abusive words. We integrate both an unsupervised learning method and efficient gadgets into a single system that enhances abusive and non-abusive word lists. The integrated decision system based on the enhanced word lists shows a precision of 94.08%, a recall of 80.79%, and an f-score of 86.93% in malicious word detection for news article comments, a precision of 89.97%, a recall of 80.55%, and an f-score 85.00% for online community comments, and a precision of 90.65%, a recall of 93.57%, and an f-score 92.09% for Twitter tweets. We expect that our approach can help to improve the current abusive word detection system, which is crucial for several web-based services including social networking services and online games.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI