计算机科学
排名(信息检索)
机器翻译
机器学习
人工智能
机器翻译评价
质量(理念)
注释
自然语言处理
集合(抽象数据类型)
数据挖掘
机器翻译软件可用性
哲学
认识论
基于实例的机器翻译
程序设计语言
出处
期刊:Natural Language Engineering
[Cambridge University Press]
日期:2019-09-11
卷期号:26 (2): 137-161
被引量:62
标识
DOI:10.1017/s1351324919000469
摘要
Abstract This article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation–based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as ‘good’ or ‘better than’, or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called ‘DEJ-based evaluation methods’, while the latter are called ‘non-DEJ-based evaluation methods’. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI