系统回顾
协议(科学)
任务(项目管理)
判断
数据提取
风险评估
心理干预
选择偏差
原始分数
计算机科学
选择(遗传算法)
心理学
应用心理学
原始数据
人工智能
统计
梅德林
医学
工程类
数学
生物
病理
计算机安全
政治学
法学
替代医学
系统工程
精神科
生物化学
程序设计语言
作者
Bashar Hasan,Samer Saadi,Noora S. Rajjoub,Moustafa Hegazi,Mohammad Al-Kordi,Farah Fleti,Magdoleen H. Farah,Irbaz Bin Riaz,Imon Banerjee,Zhen Wang,M. Hassan Murad
出处
期刊:BMJ evidence-based medicine
[BMJ]
日期:2024-02-21
卷期号:: bmjebm-112597
被引量:5
标识
DOI:10.1136/bmjebm-2023-112597
摘要
Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of ‘Classification of Intervention’. Kendall agreement coefficient was highest for the domains of ‘Participant Selection’, ‘Missing Data’ and ‘Measurement of Outcomes’, suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required.
科研通智能强力驱动
Strongly Powered by AbleSci AI