NIST公司
计算机科学
公制(单位)
自然语言处理
机器翻译评价
机器翻译
翻译
人工智能
布鲁
人气
可靠性(半导体)
翻译(生物学)
机器学习
机器翻译软件可用性
心理学
基于实例的机器翻译
物理
信使核糖核酸
基因
社会心理学
经济
功率(物理)
量子力学
化学
程序设计语言
生物化学
运营管理
出处
期刊:Interpreting
[John Benjamins Publishing Company]
日期:2022-03-04
卷期号:25 (1): 109-143
被引量:2
标识
DOI:10.1075/intp.00076.lu
摘要
Abstract Automated metrics for machine translation (MT) such as BLEU are customarily used because they are quick to compute and sufficiently valid to be useful in MT assessment. Whereas the instantaneity and reliability of such metrics are made possible by automatic computation based on predetermined algorithms, their validity is primarily dependent on a strong correlation with human assessments. Despite the popularity of such metrics in MT, little research has been conducted to explore their usefulness in the automatic assessment of human translation or interpreting. In the present study, we therefore seek to provide an initial insight into the way MT metrics would function in assessing spoken-language interpreting by human interpreters. Specifically, we selected five representative metrics – BLEU, NIST, METEOR, TER and BERT – to evaluate 56 bidirectional consecutive English–Chinese interpretations produced by 28 student interpreters of varying abilities. We correlated the automated metric scores with the scores assigned by different types of raters using different scoring methods (i.e., multiple assessment scenarios). The major finding is that BLEU, NIST, and METEOR had moderate-to-strong correlations with the human-assigned scores across the assessment scenarios, especially for the English-to-Chinese direction. Finally, we discuss the possibility and caveats of using MT metrics in assessing human interpreting.
科研通智能强力驱动
Strongly Powered by AbleSci AI