CSI: Contrastive data Stratification for Interaction prediction and its application to compound–protein interaction prediction

计算机科学 分类器(UML) 人工智能 交互信息 编码(社会科学) 测距 机器学习 分拆(数论) 源代码 数据挖掘 数学 组合数学 操作系统 电信 统计
作者
Apurva Kalia,Dilip Krishnan,Soha Hassoun
出处
期刊:Bioinformatics [Oxford University Press]
卷期号:39 (8) 被引量:1
标识
DOI:10.1093/bioinformatics/btad456
摘要

Accurately predicting the likelihood of interaction between two objects (compound-protein sequence, user-item, author-paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partition the data to create multi-views of the interacting objects. The resulting congruent and non-congruent views can then be exploited via contrastive learning techniques to learn enhanced representations of the objects.We present a novel method, Contrastive Stratification for Interaction Prediction (CSI), to stratify (partition) a dataset in a manner that can be exploited via Contrastive Multiview Coding to learn embeddings that maximize the mutual information across congruent data views. CSI assigns a key and multiple views to each data point, where data partitions under a particular key form congruent views of the data. We showcase the effectiveness of CSI by applying it to the compound-protein sequence interaction prediction problem, a pressing problem whose solution promises to expedite drug delivery (drug-protein interaction prediction), metabolic engineering, and synthetic biology (compound-enzyme interaction prediction) applications. Comparing CSI with a baseline model that does not utilize data stratification and contrastive learning, and show gains in average precision ranging from 13.7% to 39% using compounds and sequences as keys across multiple drug-target and enzymatic datasets, and gains ranging from 16.9% to 63% using reaction features as keys across enzymatic datasets.Code and dataset available at https://github.com/HassounLab/CSI.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Lyn完成签到,获得积分10
1秒前
冰红茶完成签到 ,获得积分10
2秒前
wangzilu发布了新的文献求助10
2秒前
嘿嘿嘿发布了新的文献求助10
2秒前
Akim应助陈陈陈采纳,获得10
2秒前
负责的大米完成签到,获得积分10
3秒前
6000完成签到 ,获得积分10
3秒前
淇Q完成签到,获得积分20
4秒前
4秒前
花花2024完成签到 ,获得积分10
4秒前
4秒前
4秒前
4秒前
认真向珊完成签到,获得积分20
4秒前
怡然飞薇完成签到,获得积分10
5秒前
5秒前
嘟噜完成签到,获得积分10
5秒前
coco发布了新的文献求助10
6秒前
6秒前
6秒前
俊逸红牛发布了新的文献求助10
7秒前
kb发布了新的文献求助10
7秒前
7秒前
7秒前
9秒前
9秒前
10秒前
10秒前
11秒前
damoq发布了新的文献求助30
11秒前
MM发布了新的文献求助80
11秒前
生命奋斗完成签到,获得积分10
12秒前
12秒前
斯文败类应助玩命的凝天采纳,获得10
12秒前
FWXZ给FWXZ的求助进行了留言
12秒前
12秒前
蓝莓酥study完成签到,获得积分10
13秒前
GiGi发布了新的文献求助10
13秒前
王博士完成签到,获得积分10
13秒前
13秒前
高分求助中
Smart but Scattered: The Revolutionary Executive Skills Approach to Helping Kids Reach Their Potential (第二版) 1000
PraxisRatgeber: Mantiden: Faszinierende Lauerjäger 700
The Heath Anthology of American Literature: Early Nineteenth Century 1800 - 1865 Vol. B 500
A new species of Velataspis (Hemiptera Coccoidea Diaspididae) from tea in Assam 500
Machine Learning for Polymer Informatics 500
《关于整治突出dupin问题的实施意见》(厅字〔2019〕52号) 500
2024 Medicinal Chemistry Reviews 480
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3222211
求助须知:如何正确求助?哪些是违规求助? 2870793
关于积分的说明 8172331
捐赠科研通 2537863
什么是DOI,文献DOI怎么找? 1369824
科研通“疑难数据库(出版商)”最低求助积分说明 645597
邀请新用户注册赠送积分活动 619373