条形码
随机森林
DNA条形码
人工智能
支持向量机
竹子
基因组
计算机科学
机器学习
生物
DNA测序
模式识别(心理学)
鉴定(生物学)
特征提取
计算生物学
DNA
植物
进化生物学
基因
遗传学
操作系统
作者
Ankush D. Sawarkar,Deepti D. Shrimankar,Lal Singh,Anurag Agrahari,Sagar Lachure,Neeraj Dhanraj Bokde
标识
DOI:10.1109/ic2e357697.2023.10262781
摘要
Bamboo, a grass, belongs to the Poaceae family, with 1642 species from 116 genera worldwide. It has exceptional physical, chemical, and mechanical qualities, which allow it to be employed in over a thousand different ways and contribute to a trade value of USD 2.76 billion. Bamboo is grown using rhizomes, tissue culture, or short branch cuttings without any other checks resulting in incorrect species identification and categorisation. Therefore, the classification or identification of these bamboo use its DNA barcode sequences with a K-mer based method, and machine learning (ML) is the most excellent strategy for resolving issues with the conventional or traditional categorisation of the species. A DNA barcode is a brief genetic signature that helps identify the species to which an organism belongs. It is possible to extract a useful feature from genome sequences using K-mer based approaches, which may then be used to increase comparison accuracy. In this research, we evaluate the classification performance of four supervised ML models on the DNA-barcode sequence of six Indian commercial bamboo species with a different K-mer combination. For this classification, we choose matK barcode region and supervised ML models such as Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM) and Gradient Boosting Machine (GBM). The results analysis of these models on the matK DNA sequence with different K-mers demonstrates that the classification capabilities of the GBM approaches are quite promising, and it has an accuracy of 95.3% on average.
科研通智能强力驱动
Strongly Powered by AbleSci AI