计算机科学
判别式
情态动词
概率潜在语义分析
人工智能
模式
相互信息
水准点(测量)
一致性(知识库)
概率逻辑
主题模型
机器学习
图形模型
自然语言处理
班级(哲学)
过程(计算)
情报检索
社会科学
化学
大地测量学
社会学
高分子化学
地理
操作系统
作者
Yanfei Wang,Fei Wu,Jun Song,Xi Li,Yueting Zhuang
标识
DOI:10.1145/2647868.2654901
摘要
As an important and challenging problem in the multimedia area, multi-modal data understanding aims to explore the intrinsic semantic information across different modalities in a collaborative manner. To address this problem, a possible solution is to effectively and adaptively capture the common cross-modal semantic information by modeling the inherent correlations between the latent topics from different modalities. Motivated by this task, we propose a supervised multi-modal mutual topic reinforce modeling (M$^3$R) approach, which seeks to build a joint cross-modal probabilistic graphical model for discovering the mutually consistent semantic topics via appropriate interactions between model factors (e.g., categories, latent topics and observed multi-modal data). In principle, M$^3$R is capable of simultaneously accomplishing the following two learning tasks: 1) modality-specific (e.g., image-specific or text-specific ) latent topic learning; and 2) cross-modal mutual topic consistency learning. By investigating the cross-modal topic-related distribution information, M$^3$R encourages to disentangle the semantically consistent cross-modal topics (containing some common semantic information across different modalities). In other words, the semantically co-occurring cross-modal topics are reinforced by M$^3$R through adaptively passing the mutually reinforced messages to each other in the model-learning process. To further enhance the discriminative power of the learned latent topic representations, M$^3$R incorporates the auxiliary information (i.e., categories or labels) into the process of Bayesian modeling, which boosts the modeling capability of capturing the inter-class discriminative information. Experimental results over two benchmark datasets demonstrate the effectiveness of the proposed M$^3$R in cross-modal retrieval.
科研通智能强力驱动
Strongly Powered by AbleSci AI