计算机科学
自然语言处理
人工智能
统计的
语言能力
仿形(计算机编程)
语言学
能力(人力资源)
心理学
数学
社会心理学
哲学
统计
操作系统
作者
Maria Ángeles Zarco-Tejada
出处
期刊:Digital Scholarship in the Humanities
[Oxford University Press]
日期:2018-10-24
卷期号:34 (3): 661-675
被引量:5
摘要
Abstract We describe the first wide results of the linguistic profiling of the Common European Framework of Reference (CEFR)-levelled English Corpus (CLEC), a corpus built up for Natural Language Processing purposes. The CLEC is a proficiency-levelled English corpus that covers A1, A2, B1, B2, and C1 CEFR levels and that has been built up to train statistic models for automatic proficiency assessment. We describe not only the main aspects of the corpus development but also display the linguistic features and the statistic results for levels A2, B1, and B2 written examples, carried out automatically. We show how raw text, lexical, morphosyntactic, or syntactic statistic outcomes can help to identify levels of proficiency, to test teaching materials accurate proficiency classification, to provide computable support to new text proficiency validation, and to specify level boundaries. In fact, upper levels strengthen proficiency by showing higher outcomes of lexical and syntactic complexity. This analysis validates the use of automatic tools for proficiency level identification based on lexical and syntactic data, whereas morphosyntactic features strengthen competence-level distinctions. Finally, we suggest that these results are a first step onto the CEFR-levelled automatic assessment of new texts.
科研通智能强力驱动
Strongly Powered by AbleSci AI