随机森林
化学
质谱
质谱法
数据集
气相色谱法
色谱法
科瓦茨保留指数
分析化学(期刊)
生物系统
模式识别(心理学)
人工智能
算法
统计
数学
计算机科学
生物
作者
Leo Lebanov,Laura Tedone,Alireza Ghiasvand,Brett Paull
出处
期刊:Talanta
[Elsevier]
日期:2019-10-14
卷期号:208: 120471-120471
被引量:41
标识
DOI:10.1016/j.talanta.2019.120471
摘要
Differences in chemical profiles of various essential oils (EOs) come from the fact that each plant species and chemotype has a distinctive secondary metabolism. Therefore, these differences can be used as the chemical markers for EO classification and determination of their quality. Herein, the Random Forests (RF) machine learning algorithm was applied to the classification of 20 different EOs. From three-way raw gas chromatography - mass spectra data, total chromatogram average mass spectra (TCAMS) and segment average mass spectra (SAMS) were created. TCAMS was generated by averaging response of each m/z over the whole chromatogram and SAMS by averaging the response of each fragment across a certain time segment within the chromatogram. The RF model was applied to the two data sets and optimised through the evaluation of pre-processed data, number of trees, and number of variables used in each node split. The performance of the model was evaluated through a cross-validation process, repeated 50 times by dividing the whole sample set into training and validation subsets. The calculated average out-of-bag error (OOBE), over 50 different training TCAMS data sets was 3.22 ± 1.29%, while for SAMS it was found to be 2.28 ± 1.33%. The minimal number of variables necessary for EO classification was determined by a nested cross-validation process. The amount of reduced variables in each step was 10%. It was shown that the TCAMS data set with 6 variables had similar prediction power as the SAMS with 30 variables. OOBE for classification of 20 EOs was 2.89 ± 1.44% and 3.70 ± 1.73%, for TCAMS and SAMS, respectively. Proximity between samples was used to evaluate their qualities. Samples with greater intra-class proximity had good similarity, while the lower ones indicated greater variations in the chemical profiles. The SAMS data set showed superior potential for quality assurance, compared with TCAMS.
科研通智能强力驱动
Strongly Powered by AbleSci AI