分布语义学
语义学(计算机科学)
计算机科学
文字嵌入
词(群论)
嵌入
自然语言处理
人工智能
语言学
程序设计语言
哲学
作者
Rui Feng,Congcong Yang,Yunhua Qu
标识
DOI:10.1080/09296174.2020.1767481
摘要
Recent advances in natural language processing have catalysed active research in designing algorithms to generate contextual vector representations of words, or word embedding, in the machine learning and computational linguistics community. Existing works pay little attention to patterns of words, which encode rich semantic information and impose semantic constraints on a word's context. This paper explores the feasibility of incorporating word embedding with pattern grammar, a grammar model to describe the syntactic environment of lexical items. Specifically, this research develops a method to extract patterns with semantic information of word embedding and investigates the statistical regularities and distributional semantics of the extracted patterns. The major results of this paper are as follows. Experiments on the LCMC Chinese corpus reveal that the frequency of patterns follows Zipf's hypothesis, and the frequency and pattern length are inversely related. Therefore, the proposed method enables the study of distributional properties of patterns in large-scale corpora. Furthermore, experiments illustrate that our extracted patterns impose semantic constraints on context, proving that patterns encode rich semantic and contextual information. This sheds light on the potential applications of pattern-based word embedding in a wide range of natural language processing tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI