集合(抽象数据类型)
标准化
任务(项目管理)
计算机科学
编码(社会科学)
中医药
自然语言处理
数据挖掘
人工智能
医学
统计
数学
替代医学
工程类
病理
系统工程
程序设计语言
操作系统
作者
Zhe Wang,Keqian Li,Quanying Ren,Keyu Yao,Yan Zhu
标识
DOI:10.1109/bibm58861.2023.10385776
摘要
Objective: In this study, we aim to investigate the utilization of large language models (LLMs) for traditional Chinese medicine (TCM) formula classification by fine-tuning the LLMs and prompt template. Methods: We refined and cleaned the data from the Coding Rules for Chinese Medicinal Formulas and Their Codes [1], the Chinese National Medical Insurance Catalog for Proprietary Chinese Medicines [2], and Textbooks of Formulas of Chinese Medicine [3] to address the standardization of TCM formula information, and finally we extracted 2308 TCM formula data as a dataset in this study. We designed a prompt template for the TCM formula classification task and randomly divided the formula dataset into three subsets: a training set (2000 formulas), a test set (208 formulas), and a validation set (100 formulas). We fine-tuned the open-source LLMs such as ChatGLM-6b and ChatGLM2-6b. Finally, we evaluate all selected LLMs in our study: ChatGLM-6b (original), ChatGLM2-6b (original), ChatGLM-130b, InternLM-20b, ChatGPT, ChatGLM-6b (fine-tuned), and ChatGLM2-6b (fine-tuned). Results: The results showed that ChatGLM2-6b (fine-tuned) and ChatGLM-6b (fine-tuned) achieved the highest accuracy rates of 71% and 70% on the validation set, respectively. The accuracy rates of other models were ChatGLM-130b 58%, ChatGPT 53%, InternLM-20b 52%, ChatGLM2-6b (original) 41%, and ChatGLM-6b (original) 23%. Conclusion: LLMs achieved an impressive 71% accuracy in the formula classification task in our study. This was achieved through fine-tuning and the utilization of prompt templates. And provided a novel option for the utilization of LLMs in the field of TCM.
科研通智能强力驱动
Strongly Powered by AbleSci AI