计算机科学
自然语言处理
人工智能
语料库语言学
萃取(化学)
词汇分析
语言学
哲学
化学
色谱法
标识
DOI:10.1007/978-981-97-0586-3_21
摘要
This study applies AI technology to build academic Chinese corpora. Python was employed to extract lexical chunks of various lengths, including 3-gram, 4-gram, 5-gram, and 6-gram. The identification of these lexical chunks was performed using the New-MI algorithm and filtered based on semantic relevance completeness. Subsequently, manual intervention was applied to eliminate duplicate entries and identify 1431 continuous word chunks. These lexical chunks were classified into three categories according to their functions: research-oriented, text-oriented, and participation-oriented. It was found that there were some differences in the use of chunks between Korean Chinese learners and native Chinese writers, with research-oriented chunks being used more frequently in both groups than in other categories. Korean Chinese learners used research-oriented, text-oriented, and participant-oriented chunks less frequently than native speakers. This study might provide a reference for academic Chinese writing and academic Chinese textbook development for Chinese language learners.
科研通智能强力驱动
Strongly Powered by AbleSci AI