无礼的
基线(sea)
计算机科学
自然语言处理
水准点(测量)
人工智能
嵌入
特征(语言学)
语音识别
语言学
工程类
地理
地质学
哲学
海洋学
运筹学
大地测量学
作者
Yang Hsu,Chuan‐Jie Lin
摘要
This paper introduced TOCP, a larger dataset of Chinese profanity. This dataset contains natural sentences collected from social media sites, the profane expressions appearing in the sentences, and their rephrasing suggestions which preserve their meanings in a less offensive way. We proposed several baseline systems using neural network models to test this benchmark. We trained embedding models on a profanity-related dataset and proposed several profanity-related features. Our baseline systems achieved an F1-score of 86.37% in profanity detection and an accuracy of 77.32% in profanity rephrasing.
科研通智能强力驱动
Strongly Powered by AbleSci AI