TGC-ARG: Predicting Antibiotic Resistance through Transformer-based Modeling and Contrastive Learning
计算机科学
变压器
抗生素耐药性
抗生素
微生物学
工程类
生物
电气工程
电压
作者
Yihan Dong,Xiaowen Hu,Zhijian Huang,Lei Deng
标识
DOI:10.1109/bibm58861.2023.10385506
摘要
The escalating severity of antibiotic resistance poses substantial challenges across diverse sectors, encompassing everyday life, agriculture, and clinical medical interventions. Conventional methods for investigating antibiotic resistance genes (ARGs), such as culture-based techniques and whole-genome sequencing, often suffer from demands of time, labor, and limited accuracy. Moreover, the fragmented nature of existing datasets hampers a comprehensive analysis of antibiotic resistance gene sequences. In this study, we introduce an innovative computational framework known as TGC-ARG, designed to predict potential ARGs. TGC-ARG harnesses protein sequences as input, retrieves protein structures through SCRATCH-1D, and employs a feature extraction module to deduce feature representations for both protein sequences and structures. Subsequently, we integrate a siamese network to establish a contrastive learning paradigm, thus augmenting the model's representational capabilities. The resultant sequence embeddings and structure embeddings are merged and directed into a Multilayer Perceptron (MLP) for predicting ARG presence. To assess the performance, we curate a pioneering publicly available dataset named ARSS (Antibiotic Resistance Sequence Statistics). Our extensive comparative experimental outcomes underscore the superiority of our approach over the current state-of-the-art (SOTA) methodology. Furthermore, through comprehensive case analyses, we demonstrate the efficacy of our approach in predicting potential ARGs. The dataset and source code are accessible at https://github.com/angel1gel/TGC-ARG.