过度拟合
计算机科学
人工智能
深度学习
机器学习
地铁列车时刻表
限制
集成学习
深层神经网络
决策树
人口
管道(软件)
集合预报
人工神经网络
工程类
机械工程
操作系统
社会学
人口学
程序设计语言
作者
Hongyu Zhu,Shengbin Liang,Wenguang Hu,Fangqi Li,Yali Yuan,Shilin Wang,Guang Cheng
出处
期刊:Cornell University - arXiv
日期:2023-09-16
标识
DOI:10.48550/arxiv.2309.09030
摘要
As a modern ensemble technique, Deep Forest (DF) employs a cascading structure to construct deep models, providing stronger representational power compared to traditional decision forests. However, its greedy multi-layer learning procedure is prone to overfitting, limiting model effectiveness and generalizability. This paper presents an optimized Deep Forest, featuring learnable, layerwise data augmentation policy schedules. Specifically, We introduce the Cut Mix for Tabular data (CMT) augmentation technique to mitigate overfitting and develop a population-based search algorithm to tailor augmentation intensity for each layer. Additionally, we propose to incorporate outputs from intermediate layers into a checkpoint ensemble for more stable performance. Experimental results show that our method sets new state-of-the-art (SOTA) benchmarks in various tabular classification tasks, outperforming shallow tree ensembles, deep forests, deep neural network, and AutoML competitors. The learned policies also transfer effectively to Deep Forest variants, underscoring its potential for enhancing non-differentiable deep learning modules in tabular signal processing.
科研通智能强力驱动
Strongly Powered by AbleSci AI