RLKdb: A comprehensively curated database of plant receptor-like kinase families

生物 计算生物学 数据库 生物信息学 计算机科学
作者
Zhiyuan Yin,Jinding Liu,Daolong Dou
出处
期刊:Molecular Plant [Elsevier BV]
卷期号:17 (4): 513-515 被引量:1
标识
DOI:10.1016/j.molp.2024.02.014
摘要

Since the first plant receptor-like kinase (RLK) gene ZmPK1 was cloned from Zea mays in 1990 (Walker and Zhang, 1990Walker J.C. Zhang R. Relationship of a putative receptor protein kinase from maize to the S-locus glycoproteins of Brassica.Nature. 1990; 345: 743-746Crossref PubMed Google Scholar), this large gene family has been extensively studied and shown to play crucial roles in growth, development, and immunity (Tang et al., 2017Tang D. Wang G. Zhou J.M. Receptor Kinases in Plant-Pathogen Interactions: More Than Pattern Recognition.Plant Cell. 2017; 29: 618-637Crossref PubMed Scopus (464) Google Scholar). RLKs are widespread in the plant kingdom. However, biological functions of most RLKs remain largely elusive (Dievart et al., 2020Dievart A. Gottin C. Périn C. Ranwez V. Chantret N. Origin and diversity of plant receptor-like kinases.Annu. Rev. Plant Biol. 2020; 71: 131-156Crossref PubMed Scopus (118) Google Scholar). Given RLKs share a conserved monophyletic RLK/Pelle kinase domain, RLKs in several model plants are classified into distinct families by extracellular domains (Shiu and Bleecker, 2001Shiu S.H. Bleecker A.,B. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases.Proc. Natl. Acad. Sci. USA. 2001; 98: 10763-10768Crossref PubMed Scopus (1137) Google Scholar). However, independent domain shuffling in specific lineages drives the origin of novel families, which raises a question what is the landscape of RLKs across the entire plant kingdom? Previously, sequence-homology-based methods have been widely used for RLK identification and classification, which might miss distantly related proteins with similar structures and potential novel families not mentioned in the literature. The academic community urgently requires a dedicated database for a systematic overview of the RLK gene family, providing data support for in-depth research on RLK genes. Here, we used a topology-based method to accurately isolate the RLKomes from proteomes. The obtained RLKomes were further classified into (sub)families based on extracellular domains. We constructed a comprehensively curated plant RLK database (https://biotec.njau.edu.cn/rlkdb), which contains valuable resources for investigating the origin and evolution of the RLK family and multiple online tools for personalized analysis. To obtain the landscape of RLKs in plants, we collected 300 plant genomes with chromosome-level assemblies for identification of RLKs. In addition to some significant model species, including Arabidopsis, rice, and maize, these plant genomes encompass representatives from 4 phyla, 12 classes, and 45 orders (Figure 1A; Supplemental Table 1). We adopted a previously described pipeline developed by our group to identify plant RLKs (Yin et al., 2023Yin Z. Shen D. Zhao Y. Peng H. Liu J. Dou D. Cross-kingdom analyses of transmembrane protein kinases show their functional diversity and distinct origins in protists.Comput. Struct. Biotechnol. J. 2023; 21: 4070-4078Abstract Full Text Full Text PDF PubMed Scopus (0) Google Scholar). In Arabidopsis thaliana, our pipeline identified 468 RLKs, representing a 72% increase compared to the Ensembl annotation (Martin et al., 2023Martin F.J. Amode M.R. Aneja A. Austine-Orimoloye O. Azov A.G. Barnes I. Becker A. Bennett R. Berry A. Bhai J. et al.Ensembl 2023.Nucleic Acids Res. 2023; 51: D933-D941Crossref PubMed Scopus (92) Google Scholar). We further examined the reliability of our pipeline with reference to the 610 putative RLKs reported by Shiu and Bleecker (Shiu and Bleecker, 2001Shiu S.H. Bleecker A.,B. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases.Proc. Natl. Acad. Sci. USA. 2001; 98: 10763-10768Crossref PubMed Scopus (1137) Google Scholar). Among these, we observed that our pipeline missed 144 putative RLKs while predicting two novel RLKs. In the missed RLKs, 16 putative RLK gene models were removed from the current genome assembly, and 128 putative RLKs do not have a transmembrane domain. Several methods were also used to identify leucine-rich repeat (LRR)-RLKs and some other families (Man et al., 2020Man J. Gallagher J.P. Bartlett M. Structural evolution drives diversification of the large LRR-RLK gene family.New Phytol. 2020; 226: 1492-1505Crossref PubMed Scopus (45) Google Scholar, Man et al., 2023Man J. Harrington T. Lally K. Bartlett M. Asymmetric evolution of protein domains in the leucine-rich repeat receptor-like kinase (LRR-RLK) family of plant developmental coordinators.bioRxiv. 2023; (Preprint at)https://doi.org/10.1101/2023.03.13.532436Crossref Scopus (0) Google Scholar; Ngou et al., 2022Ngou B.P.M. Heal R. Wyler M. Schmid M.W. Jones J.D.G. Concerted expansion and contraction of immune receptor gene repertoires in plant genomes.Nat. Plants. 2022; 8: 1146-1152Crossref PubMed Scopus (28) Google Scholar, Ngou et al., 2024Ngou B.P.M. Wyler M. Schmid M.W. Kadota Y. Shirasu K. Evolutionary trajectory of pattern recognition receptors in plants.Nat. Commun. 2024; 15: 308Crossref PubMed Scopus (0) Google Scholar). Comparatively, our pipeline has high accuracy and is suitable for systematic and high-throughput identification of RLKomes covering all the different families. In total, 220 038 RLKs were identified from 300 plant genomes. The RLKome size ranges from 1 to 2459, with an average proteome percentage of 1.35% (Figure 1B; Supplemental Table 1). In the past three decades, more than a dozen RLK families have been described (Dievart et al., 2020Dievart A. Gottin C. Périn C. Ranwez V. Chantret N. Origin and diversity of plant receptor-like kinases.Annu. Rev. Plant Biol. 2020; 71: 131-156Crossref PubMed Scopus (118) Google Scholar), but a systematic and automatic pipeline for the classification of RLKome is still lacking. PRGdb (http://prgdb.org/prgdb4/) is a database about pathogen receptor genes but only provides the whole list of RLKs, lacking detailed gene information and classified families. According to their distinct extracellular domain structures, RLKs were divided into 18 families. Among them, 15 families have known Pfam annotations. The remaining unannotated RLKs were clustered by protein sequence similarity, which further yielded the proline-rich extensin-like receptor kinase and unknown disordered 1 families. All the unclassified RLKs were defined as the unclassified family. LRR (44.0%), G-LecRLK (13.9%), and wall-associated kinase (11.1%) are the largest families, which make up 69% of the RLKdb (Figure 1C). The large and well-known families occur in almost all the 300 plant genomes here, while the thaumatin, glycoside hydrolase family 19; cysteine-rich secretory proteins, antigen 5, and pathogenesis-related 1 proteins; and proline-rich membrane anchor 1 families are only found in specific lineages. RLKdb has a very concise and user-friendly web interface. Through the home page or the navigation menu, users can open an RLK family (Supplemental Figure 1) or RLKome page (Supplemental Figure 2) to explore the database. In the RLK family page, the first section contains its family description, its lineage coverage, and a list box for switching to other families (Supplemental Figure 1A). The following section is an interactive table of genomes that possess the corresponding RLK family (Supplemental Figure 1B). Through the load button in the table, users can load an RLK family of interest into the third section (Supplemental Figure 1C). The RLK members and landscape of the family can be displayed in five panels: (1) the RLK table panel shows all RLK members, (2) the linkage map panel displays the positions of RLK members in the genome, (3) the length distribution panel exhibits the distribution of RLK protein lengths, (4) the domain topology panel presents the percentage of various function domain topologies and a domain word cloud, and (5) the phylogeny panel showcases the evolutionary relationships among RLK members. The RLKome page has a similar layout. Its initial section provides information about the plant genome, including details on species, lineage, taxonomy, genome assembly, cultivar, and more (Supplemental Figure 2A). The second section is a column chart showing the number of different RLK families in the RLKome. By clicking on an RLK family name, the corresponding RLK family can be retrieved and displayed in the five panels that are identical to the family page. By clicking on the hyperlinks associated with RLK IDs in the RLK table panel, users can access a dedicated RLK page displaying its detailed information (Figure 1D). In the RLK page, the first section provides a snapshot of RLK protein structure, along with essential details such as species, data source, and family information (Supplemental Figure 3A). The second section contains six panels: (1) the gene model panel shows gene exon-intron structure and domain topology in protein (Supplemental Figure 3B), (2) the transcription factor binding site panel provides a table of transcription factor binding sites upstream of the RLK gene (Supplemental Figure 3C), (3) the primer panel offers five pairs of qPCR primers (Supplemental Figure 3D), (4) the structure panel exhibits the 3D structure of the RLK protein and its ligand binding sites (Supplemental Figure 3E), (5) the interaction panel presents RLK's potential interacting proteins based on the experimentally validated protein interactions collected in the STRING database (Supplemental Figure 3F), and (6) the phylogeny panel includes a Sankey diagram to show the distribution of corresponding RLK subfamily across plant species, an interactive table of RLK subfamily members, and a phylogeny tree containing the members of the RLK subfamily (Supplemental Figure 3G). Through the phylogeny tree and the Sankey diagram, users can intuitively see the relatedness of a particular RK of interest across the diversity of plant species in the database. We also developed online tools that enable users to search and classify RLKs into different families (Figure 1E). The web-based tool allows a user to upload a proteome or transcriptome file in FASTA format (Supplemental Figure 4A). The sequences undergo processing through the pipeline on a multi-core and GPU Linux server. For a proteome file, the user will obtain an RLK annotation file containing information on signal peptide, transmembrane, kinase, and other domain regions, along with an RLK sequence file. In the case of a transcriptome file, users will receive an additional open reading frame annotation file that highlights coding regions in the transcript sequences. To enhance database accessibility, the BLAST and Foldseek programs have been integrated to support sequence similarity and structure similarity retrieval, respectively (Supplemental Figures 4B and 4C). In summary, we have accurately annotated the RLKomes and classified RLK families of 300 plant genomes with chromosome-level assemblies. The RLKdb provides comprehensive information of the RLKome, the RLK family, and RLKs. An online tool for genome- and transcriptome-wide identification and classification of RLKs was also developed. The valuable resources and tools will aid evolutionary and functional studies of plant RLKs. This study was supported by grants from the National Natural Science Foundation of China (32270208, 32202251, and 32230089), the Fundamental Research Funds for the Central Universities (KYCXJC2023001 and KYQN2023039), the Natural Science Foundation of Jiangsu Province (BK20221000), and the China Agricultural Research System (CARS-21).

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
yyy发布了新的文献求助10
刚刚
1秒前
1秒前
2秒前
2秒前
情怀应助李飞feifei采纳,获得10
3秒前
聪慧的过客完成签到,获得积分10
4秒前
安静无招发布了新的文献求助10
4秒前
儒雅的夜白完成签到,获得积分10
5秒前
Sciolto完成签到,获得积分20
5秒前
5秒前
chenluAccept完成签到,获得积分20
6秒前
6秒前
6秒前
6秒前
fahbfafajk完成签到,获得积分10
8秒前
铜豌豆发布了新的文献求助10
9秒前
华仔应助天冷记得穿秋裤采纳,获得10
10秒前
11秒前
咔叽炫发布了新的文献求助10
12秒前
14秒前
任性峻熙完成签到,获得积分20
16秒前
chenluAccept关注了科研通微信公众号
16秒前
科目三应助泽2011采纳,获得10
17秒前
17秒前
18秒前
任性峻熙发布了新的文献求助10
18秒前
19秒前
20秒前
虚心星月完成签到,获得积分10
21秒前
Panda完成签到,获得积分10
21秒前
21秒前
Atopos发布了新的文献求助10
21秒前
22秒前
Vivian发布了新的文献求助20
23秒前
huyuxuan完成签到,获得积分10
23秒前
25秒前
一期一會完成签到,获得积分20
26秒前
27秒前
30秒前
高分求助中
The Wiley Blackwell Companion to Diachronic and Historical Linguistics 3000
HANDBOOK OF CHEMISTRY AND PHYSICS 106th edition 1000
ASPEN Adult Nutrition Support Core Curriculum, Fourth Edition 1000
Signals, Systems, and Signal Processing 610
脑电大模型与情感脑机接口研究--郑伟龙 500
Genera Orchidacearum Volume 4: Epidendroideae, Part 1 500
GMP in Practice: Regulatory Expectations for the Pharmaceutical Industry 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6290524
求助须知:如何正确求助?哪些是违规求助? 8108887
关于积分的说明 16965407
捐赠科研通 5354898
什么是DOI,文献DOI怎么找? 2845506
邀请新用户注册赠送积分活动 1822653
关于科研通互助平台的介绍 1678371