联营
计算机科学
人工智能
稳健性(进化)
杂乱
模式识别(心理学)
视觉对象识别的认知神经科学
特征(语言学)
特征提取
计算机视觉
机器学习
电信
基因
生物化学
哲学
语言学
化学
雷达
作者
Y-Lan Boureau,Jean Ponce,Yann LeCun
摘要
Many modern visual recognition algorithms incorporate a step of spatial 'pooling', where the outputs of several nearby feature detectors are combined into a local or global 'bag of features', in a way that preserves task-related information while removing irrelevant details. Pooling is used to achieve invariance to image transformations, more compact representations, and better robustness to noise and clutter. Several papers have shown that the details of the pooling operation can greatly influence the performance, but studies have so far been purely empirical. In this paper, we show that the reasons underlying the performance of various pooling methods are obscured by several confounding factors, such as the link between the sample cardinality in a spatial pool and the resolution at which low-level features have been extracted. We provide a detailed theoretical analysis of max pooling and average pooling, and give extensive empirical comparisons for object recognition tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI