计算机科学
人工智能
图像检索
模式识别(心理学)
k-最近邻算法
支持向量机
特征提取
聚类分析
图像(数学)
特征(语言学)
特征向量
视觉文字
保险丝(电气)
特征检测(计算机视觉)
图像处理
哲学
语言学
电气工程
工程类
作者
Md Imran Sarker,Mariofanna Milanova
标识
DOI:10.1109/csci58124.2022.00274
摘要
Multimodal learning is omnipresent in our lives. Human absorbs features in different ways, whether through pictures or text. Combining these features in computational science, especially in Image retrieval problems, poses two significant challenges: how and when to fuse them. Most image retrieval systems use images or text data associated with the image. In this paper, we study the image retrieval task, where the input query is an image plus text sentence that describes the image. The system starts a query triggered by input image and text while taking the help of the Transformer model, which puts attention on both modalities and combines embedded features through the feature fusion technique. We proposed a feature fusion layer using modified Text Image Residual Gating in our work. We have used two methods based on the features extracted from the fusion layer. First, we trained K Nearest Neighbor (KNN) algorithm on the training data, and later we used test data to find a similar image. Second, we used the clustering technique and a support vector machine to compute the nearest neighbor points and cluster the center to see a similar image. We found that SVM (Support vector Machine) is more effective from the results, giving an overall accuracy of 92%.
科研通智能强力驱动
Strongly Powered by AbleSci AI