RLKdb: A comprehensively curated database of plant receptor-like kinase families

生物 计算生物学 数据库 生物信息学 计算机科学
作者
Zhiyuan Yin,Jinding Liu,Daolong Dou
出处
期刊:Molecular Plant [Elsevier]
卷期号:17 (4): 513-515 被引量:1
标识
DOI:10.1016/j.molp.2024.02.014
摘要

Since the first plant receptor-like kinase (RLK) gene ZmPK1 was cloned from Zea mays in 1990 (Walker and Zhang, 1990Walker J.C. Zhang R. Relationship of a putative receptor protein kinase from maize to the S-locus glycoproteins of Brassica.Nature. 1990; 345: 743-746Crossref PubMed Google Scholar), this large gene family has been extensively studied and shown to play crucial roles in growth, development, and immunity (Tang et al., 2017Tang D. Wang G. Zhou J.M. Receptor Kinases in Plant-Pathogen Interactions: More Than Pattern Recognition.Plant Cell. 2017; 29: 618-637Crossref PubMed Scopus (464) Google Scholar). RLKs are widespread in the plant kingdom. However, biological functions of most RLKs remain largely elusive (Dievart et al., 2020Dievart A. Gottin C. Périn C. Ranwez V. Chantret N. Origin and diversity of plant receptor-like kinases.Annu. Rev. Plant Biol. 2020; 71: 131-156Crossref PubMed Scopus (118) Google Scholar). Given RLKs share a conserved monophyletic RLK/Pelle kinase domain, RLKs in several model plants are classified into distinct families by extracellular domains (Shiu and Bleecker, 2001Shiu S.H. Bleecker A.,B. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases.Proc. Natl. Acad. Sci. USA. 2001; 98: 10763-10768Crossref PubMed Scopus (1137) Google Scholar). However, independent domain shuffling in specific lineages drives the origin of novel families, which raises a question what is the landscape of RLKs across the entire plant kingdom? Previously, sequence-homology-based methods have been widely used for RLK identification and classification, which might miss distantly related proteins with similar structures and potential novel families not mentioned in the literature. The academic community urgently requires a dedicated database for a systematic overview of the RLK gene family, providing data support for in-depth research on RLK genes. Here, we used a topology-based method to accurately isolate the RLKomes from proteomes. The obtained RLKomes were further classified into (sub)families based on extracellular domains. We constructed a comprehensively curated plant RLK database (https://biotec.njau.edu.cn/rlkdb), which contains valuable resources for investigating the origin and evolution of the RLK family and multiple online tools for personalized analysis. To obtain the landscape of RLKs in plants, we collected 300 plant genomes with chromosome-level assemblies for identification of RLKs. In addition to some significant model species, including Arabidopsis, rice, and maize, these plant genomes encompass representatives from 4 phyla, 12 classes, and 45 orders (Figure 1A; Supplemental Table 1). We adopted a previously described pipeline developed by our group to identify plant RLKs (Yin et al., 2023Yin Z. Shen D. Zhao Y. Peng H. Liu J. Dou D. Cross-kingdom analyses of transmembrane protein kinases show their functional diversity and distinct origins in protists.Comput. Struct. Biotechnol. J. 2023; 21: 4070-4078Abstract Full Text Full Text PDF PubMed Scopus (0) Google Scholar). In Arabidopsis thaliana, our pipeline identified 468 RLKs, representing a 72% increase compared to the Ensembl annotation (Martin et al., 2023Martin F.J. Amode M.R. Aneja A. Austine-Orimoloye O. Azov A.G. Barnes I. Becker A. Bennett R. Berry A. Bhai J. et al.Ensembl 2023.Nucleic Acids Res. 2023; 51: D933-D941Crossref PubMed Scopus (92) Google Scholar). We further examined the reliability of our pipeline with reference to the 610 putative RLKs reported by Shiu and Bleecker (Shiu and Bleecker, 2001Shiu S.H. Bleecker A.,B. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases.Proc. Natl. Acad. Sci. USA. 2001; 98: 10763-10768Crossref PubMed Scopus (1137) Google Scholar). Among these, we observed that our pipeline missed 144 putative RLKs while predicting two novel RLKs. In the missed RLKs, 16 putative RLK gene models were removed from the current genome assembly, and 128 putative RLKs do not have a transmembrane domain. Several methods were also used to identify leucine-rich repeat (LRR)-RLKs and some other families (Man et al., 2020Man J. Gallagher J.P. Bartlett M. Structural evolution drives diversification of the large LRR-RLK gene family.New Phytol. 2020; 226: 1492-1505Crossref PubMed Scopus (45) Google Scholar, Man et al., 2023Man J. Harrington T. Lally K. Bartlett M. Asymmetric evolution of protein domains in the leucine-rich repeat receptor-like kinase (LRR-RLK) family of plant developmental coordinators.bioRxiv. 2023; (Preprint at)https://doi.org/10.1101/2023.03.13.532436Crossref Scopus (0) Google Scholar; Ngou et al., 2022Ngou B.P.M. Heal R. Wyler M. Schmid M.W. Jones J.D.G. Concerted expansion and contraction of immune receptor gene repertoires in plant genomes.Nat. Plants. 2022; 8: 1146-1152Crossref PubMed Scopus (28) Google Scholar, Ngou et al., 2024Ngou B.P.M. Wyler M. Schmid M.W. Kadota Y. Shirasu K. Evolutionary trajectory of pattern recognition receptors in plants.Nat. Commun. 2024; 15: 308Crossref PubMed Scopus (0) Google Scholar). Comparatively, our pipeline has high accuracy and is suitable for systematic and high-throughput identification of RLKomes covering all the different families. In total, 220 038 RLKs were identified from 300 plant genomes. The RLKome size ranges from 1 to 2459, with an average proteome percentage of 1.35% (Figure 1B; Supplemental Table 1). In the past three decades, more than a dozen RLK families have been described (Dievart et al., 2020Dievart A. Gottin C. Périn C. Ranwez V. Chantret N. Origin and diversity of plant receptor-like kinases.Annu. Rev. Plant Biol. 2020; 71: 131-156Crossref PubMed Scopus (118) Google Scholar), but a systematic and automatic pipeline for the classification of RLKome is still lacking. PRGdb (http://prgdb.org/prgdb4/) is a database about pathogen receptor genes but only provides the whole list of RLKs, lacking detailed gene information and classified families. According to their distinct extracellular domain structures, RLKs were divided into 18 families. Among them, 15 families have known Pfam annotations. The remaining unannotated RLKs were clustered by protein sequence similarity, which further yielded the proline-rich extensin-like receptor kinase and unknown disordered 1 families. All the unclassified RLKs were defined as the unclassified family. LRR (44.0%), G-LecRLK (13.9%), and wall-associated kinase (11.1%) are the largest families, which make up 69% of the RLKdb (Figure 1C). The large and well-known families occur in almost all the 300 plant genomes here, while the thaumatin, glycoside hydrolase family 19; cysteine-rich secretory proteins, antigen 5, and pathogenesis-related 1 proteins; and proline-rich membrane anchor 1 families are only found in specific lineages. RLKdb has a very concise and user-friendly web interface. Through the home page or the navigation menu, users can open an RLK family (Supplemental Figure 1) or RLKome page (Supplemental Figure 2) to explore the database. In the RLK family page, the first section contains its family description, its lineage coverage, and a list box for switching to other families (Supplemental Figure 1A). The following section is an interactive table of genomes that possess the corresponding RLK family (Supplemental Figure 1B). Through the load button in the table, users can load an RLK family of interest into the third section (Supplemental Figure 1C). The RLK members and landscape of the family can be displayed in five panels: (1) the RLK table panel shows all RLK members, (2) the linkage map panel displays the positions of RLK members in the genome, (3) the length distribution panel exhibits the distribution of RLK protein lengths, (4) the domain topology panel presents the percentage of various function domain topologies and a domain word cloud, and (5) the phylogeny panel showcases the evolutionary relationships among RLK members. The RLKome page has a similar layout. Its initial section provides information about the plant genome, including details on species, lineage, taxonomy, genome assembly, cultivar, and more (Supplemental Figure 2A). The second section is a column chart showing the number of different RLK families in the RLKome. By clicking on an RLK family name, the corresponding RLK family can be retrieved and displayed in the five panels that are identical to the family page. By clicking on the hyperlinks associated with RLK IDs in the RLK table panel, users can access a dedicated RLK page displaying its detailed information (Figure 1D). In the RLK page, the first section provides a snapshot of RLK protein structure, along with essential details such as species, data source, and family information (Supplemental Figure 3A). The second section contains six panels: (1) the gene model panel shows gene exon-intron structure and domain topology in protein (Supplemental Figure 3B), (2) the transcription factor binding site panel provides a table of transcription factor binding sites upstream of the RLK gene (Supplemental Figure 3C), (3) the primer panel offers five pairs of qPCR primers (Supplemental Figure 3D), (4) the structure panel exhibits the 3D structure of the RLK protein and its ligand binding sites (Supplemental Figure 3E), (5) the interaction panel presents RLK's potential interacting proteins based on the experimentally validated protein interactions collected in the STRING database (Supplemental Figure 3F), and (6) the phylogeny panel includes a Sankey diagram to show the distribution of corresponding RLK subfamily across plant species, an interactive table of RLK subfamily members, and a phylogeny tree containing the members of the RLK subfamily (Supplemental Figure 3G). Through the phylogeny tree and the Sankey diagram, users can intuitively see the relatedness of a particular RK of interest across the diversity of plant species in the database. We also developed online tools that enable users to search and classify RLKs into different families (Figure 1E). The web-based tool allows a user to upload a proteome or transcriptome file in FASTA format (Supplemental Figure 4A). The sequences undergo processing through the pipeline on a multi-core and GPU Linux server. For a proteome file, the user will obtain an RLK annotation file containing information on signal peptide, transmembrane, kinase, and other domain regions, along with an RLK sequence file. In the case of a transcriptome file, users will receive an additional open reading frame annotation file that highlights coding regions in the transcript sequences. To enhance database accessibility, the BLAST and Foldseek programs have been integrated to support sequence similarity and structure similarity retrieval, respectively (Supplemental Figures 4B and 4C). In summary, we have accurately annotated the RLKomes and classified RLK families of 300 plant genomes with chromosome-level assemblies. The RLKdb provides comprehensive information of the RLKome, the RLK family, and RLKs. An online tool for genome- and transcriptome-wide identification and classification of RLKs was also developed. The valuable resources and tools will aid evolutionary and functional studies of plant RLKs. This study was supported by grants from the National Natural Science Foundation of China (32270208, 32202251, and 32230089), the Fundamental Research Funds for the Central Universities (KYCXJC2023001 and KYQN2023039), the Natural Science Foundation of Jiangsu Province (BK20221000), and the China Agricultural Research System (CARS-21).
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
霸王龙发布了新的文献求助10
刚刚
ZJW发布了新的文献求助10
1秒前
ptjam完成签到 ,获得积分10
2秒前
miss发布了新的文献求助10
3秒前
3秒前
4秒前
4秒前
sun发布了新的文献求助10
6秒前
Ava应助土里刨星星的鱼采纳,获得10
8秒前
欢呼冰岚完成签到,获得积分10
8秒前
大王卡发布了新的文献求助30
8秒前
凝子老师发布了新的文献求助10
8秒前
优雅海雪发布了新的文献求助10
10秒前
10秒前
正在获取昵称中...完成签到,获得积分10
12秒前
研白完成签到 ,获得积分10
13秒前
蜜雪冰城完成签到,获得积分10
13秒前
狂歌痛饮空度日完成签到,获得积分10
14秒前
隐形曼青应助侦察兵采纳,获得10
14秒前
欢呼冰岚发布了新的文献求助50
15秒前
陵铛铛铛发布了新的文献求助10
15秒前
搜集达人应助caoyy采纳,获得10
15秒前
YYJ25发布了新的文献求助10
16秒前
勤劳落雁发布了新的文献求助30
17秒前
科研通AI5应助优雅海雪采纳,获得10
17秒前
loulan完成签到,获得积分10
18秒前
orixero应助yyyyy语言采纳,获得10
20秒前
土里刨星星的鱼完成签到,获得积分20
20秒前
Ava应助sun采纳,获得30
22秒前
miss完成签到,获得积分10
23秒前
hu完成签到 ,获得积分10
24秒前
mathmotive完成签到,获得积分10
25秒前
白大褂完成签到,获得积分10
26秒前
26秒前
26秒前
小马甲应助孙淳采纳,获得10
28秒前
28秒前
科研通AI5应助二二二采纳,获得10
28秒前
赘婿应助尘林采纳,获得10
29秒前
HPP123完成签到,获得积分10
31秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
Bacterial collagenases and their clinical applications 800
El viaje de una vida: Memorias de María Lecea 800
Luis Lacasa - Sobre esto y aquello 700
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3527998
求助须知:如何正确求助?哪些是违规求助? 3108225
关于积分的说明 9288086
捐赠科研通 2805889
什么是DOI,文献DOI怎么找? 1540195
邀请新用户注册赠送积分活动 716950
科研通“疑难数据库(出版商)”最低求助积分说明 709849