The performance of large language model powered chatbots compared to oncology physicians on colorectal cancer queries

医学 一致性(知识库) 结直肠癌 内科学 家庭医学 肿瘤科 医学物理学 癌症 人工智能 计算机科学
作者
Shan Zhou,Xiao Luo,Chan Chen,Hong Jiang,Chun Yang,Guanghui Ran,Juan Yu,Chengliang Yin
出处
期刊:International Journal of Surgery [Elsevier]
标识
DOI:10.1097/js9.0000000000001850
摘要

Background: Large language model (LLM)-powered chatbots have become increasingly prevalent in healthcare, while their capacity in oncology remains largely unknown. To evaluate the performance of LLM-powered chatbots compared to oncology physicians in addressing to colorectal cancer queries. Methods: This study was conducted between August 13, 2023, and January 5, 2024. A total of 150 questions were designed, and each question was submitted three times to eight chatbots: ChatGPT-3.5, ChatGPT-4, ChatGPT-4 Turbo, Doctor GPT, Llama-2-70B, Mixtral-8x7B, Bard, and Claude 2.1. No feedback was provided to these chatbots. The questions were also answered by nine oncology physicians, including three residents, three fellows, and three attendings. Each answer was scored based on its consistency with guidelines, with a score of 1 for consistent answers and 0 for inconsistent answers. The total score for each question was based on the number of corrected answers, ranging from 0 to 3. The accuracy and scores of the chatbots were compared to those of the physicians. Results: Claude 2.1 demonstrated the highest accuracy, with an average accuracy of 82.67%, followed by Doctor GPT at 80.45%, ChatGPT-4 Turbo at 78.44%, ChatGPT-4 at 78%, Mixtral-8x7B at 73.33%, Bard at 70%, ChatGPT-3.5 at 64.89%, and Llama-2-70B at 61.78%. Claude 2.1 outperformed residents, fellows, and attendings. Doctor GPT outperformed residents and fellows. Additionally, Mixtral-8x7B outperformed residents. In terms of scores, Claude 2.1 outperformed residents and fellows. Doctor GPT, ChatGPT-4 Turbo and ChatGPT-4 outperformed residents. Conclusions: This study shows that LLM-powered chatbots can provide more accurate medical information compared to oncology physicians.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
YZL完成签到,获得积分20
1秒前
英俊一刀完成签到,获得积分10
2秒前
Ray发布了新的文献求助10
2秒前
3秒前
6秒前
个性的饼干完成签到,获得积分10
8秒前
羊了个羊完成签到 ,获得积分10
9秒前
松松完成签到,获得积分10
10秒前
大模型应助Heidi采纳,获得10
10秒前
Ray完成签到,获得积分10
13秒前
Chridy发布了新的文献求助10
14秒前
16秒前
在水一方应助笑笑采纳,获得10
19秒前
whhhhhhhh发布了新的文献求助10
19秒前
20秒前
深情安青应助枝桠采纳,获得10
20秒前
小泥娃发布了新的文献求助10
20秒前
飘逸的含蕊完成签到,获得积分10
22秒前
搜集达人应助sb采纳,获得10
22秒前
soapffz完成签到,获得积分10
26秒前
穆亦擎完成签到 ,获得积分10
27秒前
28秒前
乐乐应助Garry采纳,获得10
29秒前
31秒前
Deerlu完成签到,获得积分10
33秒前
未晚完成签到 ,获得积分10
34秒前
新型关注了科研通微信公众号
36秒前
咖啡续命完成签到 ,获得积分10
37秒前
NHN发布了新的文献求助10
37秒前
小泥娃完成签到 ,获得积分10
38秒前
CA发布了新的文献求助10
38秒前
39秒前
42秒前
可爱的函函应助NHN采纳,获得10
42秒前
imsskkp发布了新的文献求助10
42秒前
田様应助Chridy采纳,获得10
42秒前
香蕉觅云应助kkjl采纳,获得10
43秒前
45秒前
Ava应助qiongqiong采纳,获得10
45秒前
courage完成签到,获得积分10
45秒前
高分求助中
Sustainability in Tides Chemistry 2800
The Young builders of New china : the visit of the delegation of the WFDY to the Chinese People's Republic 1000
Rechtsphilosophie 1000
Bayesian Models of Cognition:Reverse Engineering the Mind 888
Le dégorgement réflexe des Acridiens 800
Defense against predation 800
Very-high-order BVD Schemes Using β-variable THINC Method 568
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3136252
求助须知:如何正确求助?哪些是违规求助? 2787284
关于积分的说明 7780707
捐赠科研通 2443292
什么是DOI,文献DOI怎么找? 1299034
科研通“疑难数据库(出版商)”最低求助积分说明 625318
版权声明 600888