作者
Chaoyu Zhu,Xiaoqiong Xia,Nan Li,Fan Zhong,Zhihao Yang,Lei Liu
摘要
Rare diseases (RDs) may affect individuals in small numbers, but they have a significant impact on a global scale. Accurate diagnosis of RDs is challenging, and there is a severe lack of drugs available for treatment. Pharmaceutical companies have shown a preference for drug repurposing from existing drugs developed for other diseases due to the high investment, high risk, and long cycle involved in RD drug development. Compared to traditional approaches, knowledge graph embedding (KGE) based methods are more efficient and convenient, as they treat drug repurposing as a link prediction task. KGE models allow for the enrichment of existing knowledge by incorporating multimodal information from various sources. In this study, we constructed RDKG-115, a rare disease knowledge graph involving 115 RDs, composed of 35,643 entities, 25 relations, and 5,539,839 refined triplets, based on 372,384 high-quality literature and 4 biomedical datasets: DRKG, Pathway Commons, PharmKG, and PMapp. Subsequently, we developed a trimodal KGE model containing structure, category, and description embeddings using reverse-hyperplane projection. We utilized this model to infer 4199 reliable new inferred triplets from RDKG-115. Finally, we calculated potential drugs and small molecules for each of the 115 RDs, taking multiple sclerosis as a case study. This study provides a paradigm for large-scale screening of drug repurposing and discovery for RDs, which will speed up the drug development process and ultimately benefit patients with RDs. The source code and data are available at https://github.com/ZhuChaoY/RDKG-115.