计算机科学
集成学习
人工智能
机器学习
数据包络分析
模式识别(心理学)
数学
统计
作者
Qingxian An,Siwei Huang,Yuxuan Han,You Zhu
标识
DOI:10.1016/j.cor.2024.106739
摘要
In classification tasks with large sample sets, the use of a single classifier carries the risk of overfitting. To overcome this issue, an ensemble of classifier models has often been shown to outperform the use of a single "best" model. Given the rich variety of classifier models available, the selection of the high-efficiency classifiers for a given task dataset remains an urgent challenge. However, most of the previous classifier selection methods only focus on the measurement of classification output performance without considering the computational cost. This paper proposes a new ensemble learning method to improve the classification quality for big datasets by using data envelopment analysis. It contains the following two stages: classifier selection and classifier combination. In the first stage, the commonly used classifiers are evaluated on the basis of their performance on resource consumption and classification output performance using the range directional model (RDM); then, the most efficient classifiers are selected. In the second stage, the classifier confusion matrix is evaluated using the data envelopment analysis (DEA) cross-efficiency model. Then, the weight for the classifier combination is determined to ensure that classifiers with higher performance have greater weights based on the cross-efficiency values. Experimental results demonstrate the superiority of the cross-efficiency model over the BCC model and the benchmark voting method in model ensemble. Furthermore, our method has been shown to save more computational resources and yields better results than existing methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI