计算机科学
人工智能
卷积神经网络
变压器
计算机视觉
视觉对象识别的认知神经科学
观点
模式识别(心理学)
对象(语法)
工程类
艺术
电压
电气工程
视觉艺术
作者
Jie Li,Lingjun Zhao,Li Li,Jie Lin,Jian Yao,Jingmin Tu
标识
DOI:10.1016/j.jvcir.2023.103906
摘要
With the rapid development of three-dimensional (3D) vision technology and the increasing application of 3D objects, there is an urgent need for 3D object recognition in the fields of computer vision, virtual reality, and artificial intelligence robots. The view-based method projects 3D objects into two-dimensional (2D) images from different viewpoints and applies convolutional neural networks (CNN) to model the projected views. Although these methods have achieved excellent recognition performance, there is not sufficient information interaction between the features of different views in these methods. Inspired by the recent success achieved by vision transformer (ViT) in image recognition, we propose a hybrid network by taking advantage of CNN to extract multi-scale local information of each view, and of transformer to capture the relevance of multi-scale information between different views. To verify the effectiveness of our multi-view convolutional vision transformer (MVCVT), we conduct experiments on two public benchmarks, ModelNet40 and ModelNet10, and compare with those of some state-of-the-art methods. The final results show that MVCVT has competitive performance in 3D object recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI