ProPythia: A Python package for protein classification based on machine and deep learning

计算机科学 Python(编程语言) 人工智能 机器学习 模块化设计 深度学习 降维 聚类分析 特征选择 数据挖掘 程序设计语言
作者
Ana Marta Sequeira,Diana Lousa,Miguel Rocha
出处
期刊:Neurocomputing [Elsevier BV]
卷期号:484: 172-182 被引量:21
标识
DOI:10.1016/j.neucom.2021.07.102
摘要

The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid sequences are challenging and long-standing problems, where Bioinformatics and Machine Learning have an emergent role. A myriad of machine and deep learning algorithms have been applied in these tasks with exciting results. However, tools and platforms to calculate protein features and perform both Machine Learning (ML) and Deep Learning (DL) pipelines, taking as inputs protein sequences, are still lacking and have their limitations in terms of performance, user-friendliness and restricted domains of application. Here, to address these limitations, we propose ProPythia, a generic and modular Python package that allows to easily deploy ML and DL approaches for a plethora of problems in protein sequence analysis and classification. It facilitates the implementation, comparison and validation of the major tasks in ML or DL pipelines including modules to read and alter sequences, calculate protein features, preprocess datasets, execute feature selection and dimensionality reduction, perform clustering and manifold analysis, as well as to train and optimize ML/DL models and use them to make predictions. ProPythia has an adaptable modular architecture being a versatile and easy-to-use tool, which will be useful to transform protein data in valuable knowledge even for people not familiarized with ML code. This platform was tested in several applications comparing with results from literature. Here, we illustrate its applicability in two cases studies: the prediction of antimicrobial peptides and the prediction of enzymes Enzyme commission (EC) numbers. Furthermore, we assess the performance of the different descriptors on four different protein classification challenges. Its source code and documentation, including an user guide and case studies are freely available at https://github.com/BioSystemsUM/propythia.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
vv发布了新的文献求助10
刚刚
唐家昊完成签到,获得积分10
刚刚
刚刚
Akim应助jinlinfang采纳,获得10
1秒前
所所应助清爽的鑫鹏采纳,获得10
1秒前
辛勤的掏粪工完成签到,获得积分10
1秒前
碧蓝的以云完成签到,获得积分10
1秒前
星星子发布了新的文献求助10
1秒前
1秒前
2秒前
2秒前
天天快乐应助张础锐采纳,获得10
2秒前
2秒前
2秒前
赘婿应助自觉冷松采纳,获得10
2秒前
乐观的颦发布了新的文献求助10
2秒前
PPB完成签到,获得积分10
2秒前
doou发布了新的文献求助10
3秒前
完美世界应助高子懿采纳,获得10
3秒前
牧风者也完成签到,获得积分10
3秒前
兮颜完成签到,获得积分10
3秒前
Sandy完成签到,获得积分10
4秒前
4秒前
无极微光应助ljj采纳,获得20
4秒前
sw123发布了新的文献求助10
4秒前
han发布了新的文献求助10
5秒前
炙热萝完成签到,获得积分10
5秒前
李健的小迷弟应助1134695021采纳,获得10
6秒前
CodeCraft应助刘旭采纳,获得10
6秒前
嘻嘻哈哈发布了新的文献求助60
6秒前
7秒前
7秒前
星星子完成签到,获得积分10
7秒前
miao发布了新的文献求助10
7秒前
梁子发布了新的文献求助10
7秒前
木木发布了新的文献求助10
8秒前
充电宝应助无糖气泡水采纳,获得10
8秒前
konglong发布了新的文献求助10
8秒前
9秒前
科研通AI6.2应助Ali采纳,获得10
9秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
晶种分解过程与铝酸钠溶液混合强度关系的探讨 8888
Les Mantodea de Guyane Insecta, Polyneoptera 2000
Chemistry and Physics of Carbon Volume 18 800
The Organometallic Chemistry of the Transition Metals 800
Leading Academic-Practice Partnerships in Nursing and Healthcare: A Paradigm for Change 800
Signals, Systems, and Signal Processing 610
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6422508
求助须知:如何正确求助?哪些是违规求助? 8241324
关于积分的说明 17517690
捐赠科研通 5476557
什么是DOI,文献DOI怎么找? 2892890
邀请新用户注册赠送积分活动 1869344
关于科研通互助平台的介绍 1706751