计算机科学
分类器(UML)
人工智能
交互信息
编码(社会科学)
测距
机器学习
分拆(数论)
源代码
数据挖掘
数学
组合数学
操作系统
电信
统计
作者
Apurva Kalia,Dilip Krishnan,Soha Hassoun
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2023-07-25
卷期号:39 (8)
被引量:1
标识
DOI:10.1093/bioinformatics/btad456
摘要
Accurately predicting the likelihood of interaction between two objects (compound-protein sequence, user-item, author-paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partition the data to create multi-views of the interacting objects. The resulting congruent and non-congruent views can then be exploited via contrastive learning techniques to learn enhanced representations of the objects.We present a novel method, Contrastive Stratification for Interaction Prediction (CSI), to stratify (partition) a dataset in a manner that can be exploited via Contrastive Multiview Coding to learn embeddings that maximize the mutual information across congruent data views. CSI assigns a key and multiple views to each data point, where data partitions under a particular key form congruent views of the data. We showcase the effectiveness of CSI by applying it to the compound-protein sequence interaction prediction problem, a pressing problem whose solution promises to expedite drug delivery (drug-protein interaction prediction), metabolic engineering, and synthetic biology (compound-enzyme interaction prediction) applications. Comparing CSI with a baseline model that does not utilize data stratification and contrastive learning, and show gains in average precision ranging from 13.7% to 39% using compounds and sequences as keys across multiple drug-target and enzymatic datasets, and gains ranging from 16.9% to 63% using reaction features as keys across enzymatic datasets.Code and dataset available at https://github.com/HassounLab/CSI.
科研通智能强力驱动
Strongly Powered by AbleSci AI