情态动词
计算机科学
卷积神经网络
人工智能
模式识别(心理学)
特征(语言学)
人工神经网络
融合
降维
特征提取
构造(python库)
上下文图像分类
特征向量
图像(数学)
语言学
化学
哲学
高分子化学
程序设计语言
作者
Yaning Wang,Weifeng Liu,Jianning Li,Zhangming Peng
标识
DOI:10.1109/iccais52680.2021.9624487
摘要
Aiming at the classification of indoor scene images, a multi-modal fusion model is proposed. Firstly, based on the scene image and its semantic description, a single-modal classification model is constructed. For scene images, a convolutional neural network is used to extract features and train classification models. For scene semantic descriptions, a recurrent neural network is used to extract text features. A scene feature space is then constructed and the semantic descriptions are embedded to it to obtain classification results. Secondly, these two kinds of single-modal features are fused and input to a deep neural network after dimensionality reduction, a feature-level fusion model is trained. Finally, two single-modal models and the feature-level fusion model are given different weights to construct a hybrid fusion model, the weights are constantly adjusted to get the best classification accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI