摘要
Semantic comprehension aims to reasonably reproduce people's real intentions or thoughts, e.g., sentiment, humor, sarcasm, motivation, and offensiveness, from multiple modalities. It can be instantiated as a multimodal-oriented multitask classification issue and applied to scenarios, such as online public opinion supervision and political stance analysis. Previous methods generally employ multimodal learning alone to deal with varied modalities or solely exploit multitask learning to solve various tasks, a few to unify both into an integrated framework. Moreover, multimodal-multitask cooperative learning could inevitably encounter the challenges of modeling high-order relationships, i.e., intramodal, intermodal, and intertask relationships. Related research of brain sciences proves that the human brain possesses multimodal perception and multitask cognition for semantic comprehension via decomposing, associating, and synthesizing processes. Thus, establishing a brain-inspired semantic comprehension framework to bridge the gap between multimodal and multitask learning becomes the primary motivation of this work. Motivated by the superiority of the hypergraph in modeling high-order relations, in this article, we propose a hypergraph-induced multimodal-multitask (HIMM) network for semantic comprehension. HIMM incorporates monomodal, multimodal, and multitask hypergraph networks to, respectively, mimic the decomposing, associating, and synthesizing processes to tackle the intramodal, intermodal, and intertask relationships accordingly. Furthermore, temporal and spatial hypergraph constructions are designed to model the relationships in the modality with sequential and spatial structures, respectively. Also, we elaborate a hypergraph alternative updating algorithm to ensure that vertices aggregate to update hyperedges and hyperedges converge to update their connected vertices. Experiments on the dataset with two modalities and five tasks verify the effectiveness of HIMM on semantic comprehension.