Identifying artificial intelligence–generated content in online Q&A communities through interpretable machine learning

内容(测量理论) 人工智能 计算机科学 自然语言处理 机器学习 情报检索 心理学 数据科学 数学 数学分析
作者
Qingqing Li,Ziming Zeng,Tingting Li,Shouqiang Sun
出处
期刊:Journal of Information Science [SAGE Publishing]
标识
DOI:10.1177/01655515241281491
摘要

This study aims to construct a comprehensive feature system for identifying artificial intelligence–generated content (AIGC) in online Q&A communities, thus uncovering the key factors and mechanisms influencing the identification of AIGC. First, based on the theory of systemic functional linguistics (SFL) and information quality (IQ), this article extracts vocabulary, content, structure, and emotional features from the text, and identifies the AIGC through nine mainstream machine learning algorithms. Subsequently, three widely used resampling strategies are exploited to address the category imbalance problem. The grid search optimisation algorithm fine-tunes different combinations of parameters to improve the performance of the identification classifier. Finally, SHAP values are introduced to evaluate and elucidate the global feature importance and feature influence mechanism. A Chinese corpus from the Zhihu Q&A community is constructed to verify the validity of these methods. The experimental results show that the eXtreme Gradient Boosting (XGBoost) model optimised with hybrid sampling and grid search parameters exhibits excellent performance in identifying AI-generated text, which achieves an F 1 -score of 0.9935, an improvement of 0.11 percentage points over the original model. In addition, all four dimensions of features constructed in this article contribute to AI-generated text identification, and the results of feature interpretability analysis show the greatest impact of features that focus on content readability. The study facilitates the identification and labelling of AIGC in online Q&A communities, thereby enhancing transparency and accountability of information shared online.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
赘婿应助hdn采纳,获得10
3秒前
大个应助开心易真采纳,获得10
3秒前
月球人完成签到 ,获得积分10
3秒前
xiaojin完成签到,获得积分10
6秒前
orixero应助清欢采纳,获得30
6秒前
ccc发布了新的文献求助20
7秒前
Papillon完成签到,获得积分10
8秒前
整齐的水之应助王立辉采纳,获得50
9秒前
斯文败类应助乱武采纳,获得10
9秒前
优秀白竹发布了新的文献求助10
11秒前
12秒前
连忘幽完成签到,获得积分10
13秒前
13秒前
科研通AI5应助银角大王采纳,获得100
16秒前
动漫大师发布了新的文献求助10
17秒前
陶绿完成签到,获得积分10
18秒前
TheaGao发布了新的文献求助30
18秒前
少辰发布了新的文献求助10
20秒前
20秒前
20秒前
kuan完成签到,获得积分10
22秒前
23秒前
英俊的铭应助心中的日月采纳,获得10
24秒前
夏沫完成签到,获得积分10
24秒前
25秒前
方超发布了新的文献求助10
25秒前
乱武发布了新的文献求助10
26秒前
28秒前
鱼跃完成签到,获得积分10
32秒前
小王完成签到,获得积分10
33秒前
nozero应助心灵美的土豆采纳,获得30
35秒前
36秒前
不动僧完成签到,获得积分10
37秒前
37秒前
星辰大海应助WANG采纳,获得10
37秒前
38秒前
ding应助tassssadar采纳,获得10
39秒前
sxy0604发布了新的文献求助10
40秒前
41秒前
41秒前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2700
Neuromuscular and Electrodiagnostic Medicine Board Review 1000
こんなに痛いのにどうして「なんでもない」と医者にいわれてしまうのでしょうか 510
The First Nuclear Era: The Life and Times of a Technological Fixer 500
岡本唐貴自伝的回想画集 500
Distinct Aggregation Behaviors and Rheological Responses of Two Terminally Functionalized Polyisoprenes with Different Quadruple Hydrogen Bonding Motifs 450
Ciprofol versus propofol for adult sedation in gastrointestinal endoscopic procedures: a systematic review and meta-analysis 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3670761
求助须知:如何正确求助?哪些是违规求助? 3227655
关于积分的说明 9776657
捐赠科研通 2937838
什么是DOI,文献DOI怎么找? 1609653
邀请新用户注册赠送积分活动 760441
科研通“疑难数据库(出版商)”最低求助积分说明 735894