隐藏字幕
计算机科学
自回归模型
推论
解码方法
判决
人工智能
潜变量
词(群论)
图像(数学)
语音识别
模式识别(心理学)
自然语言处理
算法
数学
统计
几何学
标识
DOI:10.1145/3394171.3413901
摘要
Current state-of-the-art image captioning systems generally produce a sentence from left to right, and every step is conditioned on the given image and previously generated words. Nevertheless, such autoregressive nature makes the inference process difficult to parallelize and leads to high captioning latency. In this paper, we propose a non-autoregressive approach for faster image caption generation. Technically, low-dimension continuous latent variables are shaped to capture semantic information and word dependencies from extracted image features before sentence decoding. Moreover, we develop an iterative back modification inference algorithm, which continuously refines the latent variables with a look back mechanism and parallelly generates the whole sentence based on the updated latent variables in a constant number of steps. Extensive experiments demonstrate that our method achieves competitive performance compared to prevalent autoregressive captioning models while significantly reducing the decoding time on average.
科研通智能强力驱动
Strongly Powered by AbleSci AI