清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum

计算机科学 计算生物学 随机森林 Cas9 基因 人工智能 模式识别(心理学) 清脆的 生物 机器学习 遗传学
作者
Sita Sirisha Madugula,Pranav Pujar,Bharani Nammi,Shouyi Wang,Vindi M. Jayasinghe‐Arachchige,Tyler Pham,Dominic Mashburn,Maria Artiles,Jin Liu
出处
期刊:Journal of Chemical Information and Modeling [American Chemical Society]
卷期号:64 (12): 4897-4911 被引量:2
标识
DOI:10.1021/acs.jcim.4c00625
摘要

The recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations such as large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In this study, we aim to elucidate the unique protein features associated with Cas9 and Cas12 families and identify the features distinguishing each family from non-Cas proteins. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,494 features) encoding various physiochemical, topological, constitutional, and coevolutionary information on Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and non-Cas proteins. All the models were evaluated rigorously on the test and independent data sets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 92% and 95% on their respective independent data sets, while the multiclass classifier achieved an F1 score of close to 0.98. We observed that Quasi-Sequence-Order (QSO) descriptors like Schneider.lag and Composition descriptors like charge, volume, and polarizability are predominant in the Cas12 family. Conversely Amino Acid Composition descriptors, especially Tripeptide Composition (TPC), predominate the Cas9 family. Four of the top 10 descriptors identified in Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all Cas9 proteins and located within different catalytically important domains of the Streptococcus pyogenes Cas9 (SpCas9) structure. Among these, DHI and HHA are well-known to be involved in the DNA cleavage activity of the SpCas9 protein. Mutation studies have highlighted the significance of the PWN tripeptide in PAM recognition and DNA cleavage activity of SpCas9, while Y450 from the PYY tripeptide plays a crucial role in reducing off-target effects and improving the specificity in SpCas9. Leveraging our machine learning (ML) pipeline, we identified numerous Cas9 and Cas12 family-specific features. These features offer valuable insights for future experimental and computational studies aiming at designing Cas systems with enhanced gene-editing properties. These features suggest plausible structural modifications that can effectively guide the development of Cas proteins with improved editing capabilities.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
36秒前
周钰波完成签到,获得积分20
40秒前
42秒前
49秒前
归尘发布了新的文献求助10
56秒前
动听冰淇淋完成签到,获得积分10
1分钟前
wodetaiyangLLL完成签到 ,获得积分10
1分钟前
忘忧Aquarius完成签到,获得积分10
1分钟前
浚稚完成签到 ,获得积分10
1分钟前
2分钟前
红茸茸羊完成签到 ,获得积分10
2分钟前
呆呆的猕猴桃完成签到 ,获得积分10
2分钟前
轩辕白竹完成签到,获得积分10
3分钟前
科研通AI5应助科研通管家采纳,获得10
3分钟前
震动的听枫完成签到,获得积分10
5分钟前
5分钟前
爆米花应助张立人采纳,获得30
5分钟前
5分钟前
科研通AI5应助坐看云起时采纳,获得10
5分钟前
张立人发布了新的文献求助30
5分钟前
6分钟前
方白秋完成签到,获得积分10
6分钟前
6分钟前
萌新完成签到 ,获得积分10
6分钟前
Lemon_ice完成签到,获得积分10
6分钟前
彭于晏应助KSDalton采纳,获得10
7分钟前
思源应助科研通管家采纳,获得10
7分钟前
修辛完成签到 ,获得积分10
8分钟前
9分钟前
天边的云彩完成签到 ,获得积分10
9分钟前
9分钟前
自然的含蕾完成签到 ,获得积分10
9分钟前
9分钟前
Hxy发布了新的文献求助10
9分钟前
研友_ndDGVn完成签到 ,获得积分10
9分钟前
心随以动完成签到 ,获得积分10
9分钟前
情怀应助谦让的语儿采纳,获得10
9分钟前
研友_ZragOn发布了新的文献求助30
10分钟前
我是老大应助Frank采纳,获得10
10分钟前
1437594843完成签到 ,获得积分10
10分钟前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Conference Record, IAS Annual Meeting 1977 1050
Structural Load Modelling and Combination for Performance and Safety Evaluation 1000
Barth, Derrida and the Language of Theology 500
2024-2030年中国聚异戊二烯橡胶行业市场现状调查及发展前景研判报告 500
Facharztprüfung Kardiologie 400
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3596031
求助须知:如何正确求助?哪些是违规求助? 3162976
关于积分的说明 9542853
捐赠科研通 2868096
什么是DOI,文献DOI怎么找? 1575657
邀请新用户注册赠送积分活动 740270
科研通“疑难数据库(出版商)”最低求助积分说明 724067