计算机科学
变压器
卷积神经网络
人工智能
模式识别(心理学)
深度学习
机器学习
工程类
电压
电气工程
作者
Tianyang Gu,Ruipeng Min
标识
DOI:10.1145/3529836.3529894
摘要
Shape recognition is a fundamental problem in the field of computer vision, which aims to classify various shapes. The current mainstream network architecture is convolutional neural network (CNN), however, CNN offers limited ability to extract valuable information from simple shapes for shape classification. To address this problem, this paper proposes a deep learning model based on self-attention and Vision Transformers structure (ViT) to achieve shape recognition. Compared with the traditional CNN structure, ViT considers the long-distance relationship and reduces the loss of information between layers. The model utilizes a shifted-window hierarchical vision transformer (Swin Transformer) structure and an all-scale shape representation to improve the performance of the model. Experimental results show that the proposed model achieves superior accuracy compared to other methods, achieving an accuracy of 93.82% on the animal dataset, while the performance of state-of-the-art VGG-based method is only 90.02%.
科研通智能强力驱动
Strongly Powered by AbleSci AI