Abstract Second language (L2) viewing with captions (i.e., L2 on‐screen text) is now a proliferating as well as promising area of L2 acquisition research. The goal of the present meta‐analysis was to examine (a) the relationship between captioned viewing and incidental vocabulary learning and (b) what variables related to learners, treatment, methodology, and vocabulary tests moderate the captioning effect. Synthesizing 89 effect sizes from 49 primary studies (i.e., independent experiments), we fitted a multilevel meta‐analysis model with restricted maximum likelihood estimation to calculate the overall effect size based on a standardized mean difference of gain scores between captioned viewing and uncaptioned viewing groups. The results showed a medium effect of captioning on L2 vocabulary learning, g = 0.56, p <.001. Moderator analysis indicated moderating effects of instructional level, target audience of video materials, and administration of vocabulary pretest. These results are discussed with the aim of guiding future research and language learning through viewing.