ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

零(语言学) 计算机科学 人工智能 自然语言处理 自然语言 自然语言生成 自然(考古学) 弹丸 语言学 材料科学 历史 哲学 考古 冶金
作者
Bang Yang,Fenglin Liu,Yuexian Zou,Xian Wu,Yaowei Wang,David A. Clifton
出处
期刊:IEEE Transactions on Pattern Analysis and Machine Intelligence [Institute of Electrical and Electronics Engineers]
卷期号:46 (8): 5712-5724 被引量:3
标识
DOI:10.1109/tpami.2024.3371376
摘要

Natural Language Generation (NLG) accepts input data in the form of images, videos, or text and generates corresponding natural language text as output. Existing NLG methods mainly adopt a supervised approach and rely heavily on coupled data-to-text pairs. However, for many targeted scenarios and for non-English languages, sufficient quantities of labeled data are often not available. As a result, it is necessary to collect and label data-text pairs for training, which is both costly and time-consuming. To relax the dependency on labeled data of downstream tasks, we propose an intuitive and effective zero-shot learning framework, ZeroNLG, which can deal with multiple NLG tasks, including image-to-text (image captioning), video-to-text (video captioning), and text-to-text (neural machine translation), across English, Chinese, German, and French within a unified framework. ZeroNLG does not require any labeled downstream pairs for training. During training, ZeroNLG (i) projects different domains (across modalities and languages) to corresponding coordinates in a shared common latent space; (ii) bridges different domains by aligning their corresponding coordinates in this space; and (iii) builds an unsupervised multilingual auto-encoder to learn to generate text by reconstructing the input text given its coordinate in shared latent space. Consequently, during inference, based on the data-to-text pipeline, ZeroNLG can generate target sentences across different languages given the coordinate of input data in the common space. Within this unified framework, given visual (imaging or video) data as input, ZeroNLG can perform zero-shot visual captioning; given textual sentences as input, ZeroNLG can perform zero-shot machine translation. We present the results of extensive experiments on twelve NLG tasks, showing that, without using any labeled downstream pairs for training, ZeroNLG generates high-quality and "believable" outputs and significantly outperforms existing zero-shot methods. Our code and data are available at https://github.com/yangbang18/ZeroNLG .

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Zq完成签到 ,获得积分10
1秒前
你在发布了新的文献求助10
2秒前
2秒前
英姑应助Peter_Zhu采纳,获得10
2秒前
yang完成签到,获得积分10
3秒前
顾矜应助笑点低慕灵采纳,获得10
3秒前
哈哈哈完成签到 ,获得积分10
3秒前
6秒前
6秒前
xiaoyu完成签到 ,获得积分10
7秒前
科研通AI6.1应助坤仔采纳,获得10
7秒前
彩色的忆霜完成签到 ,获得积分10
8秒前
溜达鸡发布了新的文献求助10
8秒前
华仔应助你在采纳,获得10
9秒前
隐形曼青应助闪闪的发夹采纳,获得10
9秒前
小二郎应助范佩西采纳,获得10
10秒前
李健的粉丝团团长应助111采纳,获得10
10秒前
Karl发布了新的文献求助10
13秒前
lululucy发布了新的文献求助30
13秒前
17秒前
Lucas应助Tzu采纳,获得10
23秒前
范佩西发布了新的文献求助10
23秒前
25秒前
想上985完成签到,获得积分10
26秒前
cL完成签到 ,获得积分10
28秒前
稳重的蛟凤应助123采纳,获得10
28秒前
orixero应助小慧儿采纳,获得10
29秒前
李爱国应助nessa采纳,获得10
29秒前
充电宝应助梁寒采纳,获得10
29秒前
32秒前
35秒前
11关闭了11文献求助
35秒前
36秒前
37秒前
39秒前
稳重的蛟凤应助leisurelft采纳,获得10
40秒前
小慧儿发布了新的文献求助10
40秒前
yznfly完成签到,获得积分0
42秒前
玉清完成签到,获得积分20
43秒前
gzhoax应助Qian0925采纳,获得20
45秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Les Mantodea de guyane 2500
Signals, Systems, and Signal Processing 510
Discrete-Time Signals and Systems 510
Driving under the influence: Epidemiology, etiology, prevention, policy, and treatment 500
生活在欺瞒的年代:傅树介政治斗争回忆录 260
Functional Analysis 200
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5872944
求助须知:如何正确求助?哪些是违规求助? 6494160
关于积分的说明 15670339
捐赠科研通 4990359
什么是DOI,文献DOI怎么找? 2690230
邀请新用户注册赠送积分活动 1632758
关于科研通互助平台的介绍 1590636