社会语言学
语言学
元数据
语料库语言学
计算机科学
领域(数学)
社会学
自然语言处理
万维网
数学
哲学
纯数学
作者
Wilkinson Daniel Wong Gonzales
标识
DOI:10.1075/ijcl.22096.gon
摘要
Abstract This paper introduces the Lannang Corpus (LanCorp), a public 375,000-word collection of raw and transcribed recordings of Lannang languages spoken in metropolitan Manila, which have been annotated with part-of-speech tags and linked to 40 types of sociolinguistic metadata. It begins by providing an overview of the LanCorp (e.g. design, formats, accessibility). Then, it goes on to show various examples of how the corpus can be used for variationist sociolinguistic research, using Lánnang-uè data as a case study. The findings from the exploratory studies indicate that Lannang languages are influenced by sociolinguistic factors, demonstrating the intricate nature of the Sino-Philippine sociolinguistic ecology. Due to its large size, sociolinguistic metadata, and various formats, LanCorp can be used to study Lannang languages in general and how they are used by specific social groups. It enables scholars to investigate multilingual interactions in a wide range of sociolinguistic factors, furthering the field of Sino-Philippine (socio)linguistics.
科研通智能强力驱动
Strongly Powered by AbleSci AI