杠杆(统计)
医学
接收机工作特性
召回
F1得分
考试(生物学)
任务(项目管理)
自然语言处理
医学物理学
内科学
人工智能
计算机科学
心理学
认知心理学
古生物学
经济
管理
生物
作者
Jeffrey Tully,Onkar Litake,Minhthy N. Meineke,Sierra Simpson,Ruth S. Waterman,Rodney A. Gabriel
标识
DOI:10.2196/preprints.52975
摘要
BACKGROUND Tools that can help to identify preoperative patients in need of further cardiovascular testing or consultation may be of use in reducing costs and ensuring rational utilization of resources. OBJECTIVE We evaluate the feasibility of utilizing general purpose versus domain-specific large language models (LLM) for a classification task aimed at identifying these surgical patients. METHODS The objective of this study was to leverage various LLMs to classify patients that would need preoperative cardiac evaluation based on their preoperative clinical notes. General-purpose (BERT, RoBERTa, Longformer) and domain-specific (BioClinicalBERT, PubMedBERT) were used to train on this classification task. Performance was validated on the test set and the area under the receiver operating characteristics curve (AUC), F1-score, sensitivity, specificity, precision, and recall were measured. RESULTS There were 175 patients, in which 67 (38.2%) patients were determined to require preoperative cardiac evaluation/testing. The dataset was divided into a training and test set, which consisted of 75% (n=131) and 25% (n=44) of the dataset. All models performed similarly, in which the AUC was highest with Longformer (0.90) and the Precision-Recall score was highest with PubMedBERT (0.88). CONCLUSIONS This study described the use of three general purpose and two domain-specific LLMs to classify surgical patients in need of preoperative cardiovascular workup. All LLMs had excellent yet similar performance. LLMs may be leveraged on preoperative clinical notes to classify which patients would benefit from preoperative cardiology evaluations. No clinically significant differences were seen between domain-specific and general-purpose LLMs. CLINICALTRIAL
科研通智能强力驱动
Strongly Powered by AbleSci AI