注释
基因
生物
计算生物学
基因组
选择(遗传算法)
遗传学
计算机科学
机器学习
作者
Lei Du,Cui Lu,Zhentao Wang,Lee Zou,Yan Xiong,Qun‐Jie Zhang
出处
期刊:Beverage plant research
[Maximum Academic Press]
日期:2023-01-01
卷期号:: 1-10
标识
DOI:10.48130/bpr-0023-0041
摘要
Flavonoids are important secondary metabolites synthesized by the tea plant. However, inconsistencies in the variations in gene annotation methods across numerous studies, have hindered the comparisons of results from previous studies. In this work, we offer 'GFAnno', an open-source software package annotates genes and gene families based on sequence features, along with annotated parameters for 18 key genes related to the flavonoid biosynthesis pathway. The package takes a protein sequence file as input, performs gene annotation based on the identity and coverage of pre-prepared known seed protein sequences and the coverage of conserved Hidden Markov Model (HMM) domain. We used 11 dicotyledon, 7 monocotyledon, and 2 basal angiosperm genomes to construct three datasets. We then use the seed species collection to construct seed sequences, use the test species collection to follow strict parameter selection rules, and use the validation species collection to verify the accuracy of the analysis results. The annotation results of validation collection using the filtering parameters by test collection shows that, our parameter selection can effectively exclude various structurally incomplete and abnormal proteins, while correctly distinguishing genes with high sequence similarity, such as Flavonoid 3'-Hydroxylase (F3'H) and Flavonoid 3'5'-Hydroxylase (F3'5'H) in the cytochrome P450 (CYP450). Our work aids ongoing tea plant pan-genome research by offering a convenient software for target gene annotation and sets comparative standards for analyzing the flavonoid biosynthesis pathway and conducting sequence comparison of catalytic enzymes.
科研通智能强力驱动
Strongly Powered by AbleSci AI