启发式
计算机科学
二次方程
扩展(谓词逻辑)
数学优化
二进制数
树(集合论)
算法
数学
数学分析
几何学
算术
程序设计语言
作者
Zacharie Alès,Valentine Huré,Amélie Lambert
标识
DOI:10.1016/j.cor.2023.106515
摘要
The most efficient state-of-the-art methods to build classification trees are greedy heuristics (e.g. CART) that may fail to find underlying characteristics in datasets. Recently, exact linear formulations that have shown better accuracy were introduced. However they do not scale up to datasets with more than a few thousands data points. In this paper, we introduce four new formulations for building optimal trees. The first one is a quadratic model based on the well-known formulation of Bertsimas et al. We then propose two different linearizations of this new quadratic model. The last model is an extension to real-valued datasets of a flow-formulation limited to binary datasets (Aghaei et al.). Each model is introduced for both axis-aligned and oblique splits. We further prove that our new formulations have stronger continuous relaxations than existing models. Finally, we present computational results on 22 standard datasets with up to thousands of data points. Our exact models have reduced solution times with learning performances as strong or significantly better than state of the art exact approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI