人工智能
分类
计算机科学
模式识别(心理学)
图像(数学)
上下文图像分类
支持向量机
集合(抽象数据类型)
图像融合
融合
自然语言处理
计算机视觉
语言学
哲学
程序设计语言
作者
Qiang Zhu,Mei-Chen Yeh,Kwang‐Ting Cheng
标识
DOI:10.1145/1180639.1180698
摘要
Conventional image categorization techniques primarily rely on low-level visual cues. In this paper, we describe a multimodal fusion scheme which improves the image classification accuracy by incorporating the information derived from the embedded texts detected in the image under classification. Specific to each image category, a text concept is first learned from a set of labeled texts in images of the target category using Multiple Instance Learning [1]. For an image under classification which contains multiple detected text lines, we calculate a weighted Euclidian distance between each text line and the learned text concept of the target category. Subsequently, the minimum distance, along with low-level visual cues, are jointly used as the features for SVM-based classification. Experiments on a challenging image database demonstrate that the proposed fusion framework achieves a higher accuracy than the state-of-art methods for image classification.
科研通智能强力驱动
Strongly Powered by AbleSci AI