水准点(测量)
计算机科学
构造(python库)
适应(眼睛)
领域(数学分析)
语言模型
自然语言处理
人工智能
心理学
程序设计语言
地理
数学分析
数学
大地测量学
神经科学
作者
Pengcheng Qiu,Chaoyi Wu,Xiaoman Zhang,Wei‐Xiong Lin,Haicheng Wang,Ya Zhang,Yanfeng Wang,Weidi Xie
标识
DOI:10.1038/s41467-024-52417-z
摘要
Abstract The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions. To promote this domain, we present contributions from the following: First, we construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages, termed as MMedC, enabling auto-regressive domain adaptation for general LLMs; Second, to monitor the development of multilingual medical LLMs, we propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench; Third, we have assessed a number of open-source large language models (LLMs) on our benchmark, along with those further auto-regressive trained on MMedC. Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks, even rivaling GPT-4. In conclusion, in this work, We present a large-scale corpus, a benchmark and a series of models to support the development of multilingual medical LLMs.
科研通智能强力驱动
Strongly Powered by AbleSci AI