计算机科学
机器翻译
NIST公司
自然语言处理
人工智能
操作化
人气
机器翻译评价
语言习得
分级(工程)
公制(单位)
能力(人力资源)
数学教育
心理学
机器翻译软件可用性
社会心理学
哲学
运营管理
土木工程
认识论
基于实例的机器翻译
工程类
经济
标识
DOI:10.1080/09588221.2021.1968915
摘要
The use of translation and interpreting (T&I) in the language learning classroom is commonplace, serving various pedagogical and assessment purposes. Previous utilization of T&I exercises is driven largely by their potential to enhance language learning, whereas the latest trend has begun to underscore T&I as a crucial skill to be acquired as part of transcultural competence for language learners and future language users. Despite their growing popularity and utility in the language learning classroom, assessing T&I is time-consuming, labor-intensive and cognitively taxing for human raters (e.g., language teachers), primarily because T&I assessment entails meticulous evaluation of informational equivalence between the source-language message and target-language renditions. One possible solution is to rely on automated quality metrics that are originally developed to evaluate machine translation (MT). In the current study, we investigated the viability of using four automated MT evaluation metrics, BLEU, NIST, METEOR and TER, to assess human interpretation. Essentially, we correlated the automated metric scores with the human-assigned scores (i.e., the criterion measure) from multiple assessment scenarios to examine the degree of machine-human parity. Overall, we observed fairly strong metric-human correlations for BLEU (Pearson's r = 0.670), NIST (r = 0.673) and METEOR (r = 0.882), especially when the metric computation was conducted on the sentence level rather than the text level. We discussed these emerging findings and others in relation to the feasibility of operationalizing MT metrics to evaluate students' interpretation in the language learning classroom.Supplemental data for this article is available online at https://doi.org/10.1080/09588221.2021.1968915 .
科研通智能强力驱动
Strongly Powered by AbleSci AI