计算机科学
人工智能
比例(比率)
模式识别(心理学)
地图学
地理
作者
Hong Zhang,ZhiXiang Dong,Bo Li,Siyuan He
标识
DOI:10.1016/j.knosys.2022.109792
摘要
MLP-Mixer is a vision architecture that solely relies on multilayer perceptrons (MLPs), which despite their simple architecture, they achieve a slightly inferior accuracy to the state-of-the-art models on ImageNet. Given that the MLP-Mixer segments each input image into a fixed number of patches, small-scale MLP-Mixers are preferred due to attaining better accuracy because the image is segmented into more patches. However, this strategy significantly increases the computational burden. Nevertheless, this paper argues that even in the same dataset, each image has a different recognition difficulty due to its characteristics. Therefore, in the ideal case, choosing an independently scaled MLP-Mixer for each image is the most economical computational approach. Hence, this paper experimentally verifies the objective existence of this phenomenon, which inspires us to propose the Multi-Scale MLP-Mixer (MSMLP) that utilizes a suitably scaled MLP-Mixer for each input image. MSMLP comprises several MLP-Mixers of different scales. During testing, these MLP-Mixers are activated in order of scale from large to small (increasing number of patches and decreasing patch size). In addition, to reduce redundant computations, a feature reuse mechanism is designed between neighboring MLP-Mixers so that the small-scale MLP-Mixer downstream can reuse the features learned by the larger-scale MLP-Mixer upstream. Finally, extensive experiments on the public dataset CIFAR10/100 reveal that our method’s theoretically estimated computational cost and actual inference speed are significantly higher than those of MLP-Mixer.
科研通智能强力驱动
Strongly Powered by AbleSci AI