Identifying optimal variables for machine learning based fish distribution modeling
鱼
渔业
生态学
统计
环境科学
计算机科学
人工智能
机器学习
生物
数学
作者
Shaohua Xu,Jintao Wang,Xinjun Chen,Jintao Wang
出处
期刊:Canadian Journal of Fisheries and Aquatic Sciences [Canadian Science Publishing] 日期:2024-06-01卷期号:81 (6): 687-698
标识
DOI:10.1139/cjfas-2023-0197
摘要
Machine learning occupies a central position in the modeling of fish distribution patterns. The augmentation of explanatory variables in fish habitat through many kinds of observational methodologies necessitates the discernment of an optimal combination of these variables for fish distribution modeling. We proposed a feature selection technique, recursive feature elimination with cross-validation (RFECV), to determine optimal variables combinations for yellowfin tuna distribution in the Pacific Ocean. Four tree-based models, random forest, eXtreme Gradient Boosting, Light Gradient Boosting Machine, and categorical boosting driven by RFECV, were developed using comprehensive fisheries and biotic/abiotic data. Habitat variables including sea temperature, dissolved oxygen concentration, chlorophyll-a concentration, sea salinity, and sea surface height were identified as significant features by all models. The models were trained using the corresponding selected variables, and these trained models were employed to predict the spatiotemporal distribution of yellowfin tuna from 1995 to 2019. The results obtained could inform useful knowledge for the sustainable exploitation of yellowfin tuna in the Pacific Ocean and furnish a benchmark of feature selection for machine-learning-based distribution modeling of other pelagic species.