词汇
健康信息学
计算机科学
受控词汇
领域(数学分析)
词(群论)
卫生专业人员
信息学
自然语言处理
人工智能
医疗保健
语言学
医学
护理部
工程类
公共卫生
政治学
哲学
法学
数学分析
电气工程
数学
作者
Gen Gu,Xingting Zhang,Xingeng Zhu,Zhe Jian,Ken Chen,Dong Wen,Li Gao,Shaodian Zhang,Fei Wang,Handong Ma,Jianbo Lei
摘要
The vocabulary gap between consumers and professionals in the medical domain hinders information seeking and communication. Consumer health vocabularies have been developed to aid such informatics applications. This purpose is best served if the vocabulary evolves with consumers' language.Our objective is to develop a method for identifying and adding new terms to consumer health vocabularies, so that it can keep up with the constantly evolving medical knowledge and language use.In this paper, we propose a consumer health term-finding framework based on a distributed word vector space model. We first learned word vectors from a large-scale text corpus and then adopted a supervised method with existing consumer health vocabularies for learning vector representation of words, which can provide additional supervised fine tuning after unsupervised word embedding learning. With a fine-tuned word vector space, we identified pairs of professional terms and their consumer variants by their semantic distance in the vector space. A subsequent manual review of the extracted and labeled pairs of entities was conducted to validate the results generated by the proposed approach. The results were evaluated using mean reciprocal rank (MRR).Manual evaluation showed that it is feasible to identify alternative medical concepts by using professional or consumer concepts as queries in the word vector space without fine tuning, but the results are more promising in the final fine-tuned word vector space. The MRR values indicated that on an average, a professional or consumer concept is about 14th closest to its counterpart in the word vector space without fine tuning, and the MRR in the final fine-tuned word vector space is 8. Furthermore, the results demonstrate that our method can collect abbreviations and common typos frequently used by consumers.By integrating a large amount of text information and existing consumer health vocabularies, our method outperformed several baseline ranking methods and is effective for generating a list of candidate terms for human review during consumer health vocabulary development.
科研通智能强力驱动
Strongly Powered by AbleSci AI