潜在Dirichlet分配
文字2vec
计算机科学
主题模型
人工智能
文字嵌入
词(群论)
概率逻辑
文件分类
特征(语言学)
Dirichlet分布
自然语言处理
特征提取
嵌入
数学
哲学
数学分析
边值问题
语言学
几何学
作者
Zhibo Wang,Long Ma,Yanqing Zhang
摘要
Latent Dirichlet Allocation (LDA) is a probabilistic topic model to discover latent topics from documents and describe each document with a probability distribution over the discovered topics. It defines a global hierarchical relationship from words to a topic and then from topics to a document. Word2Vec is a word-embedding model to predict a target word from its surrounding contextual words. In this paper, we propose a hybrid approach to extract features from documents with bag-of-distances in a semantic space. By using both Word2Vec and LDA, our hybrid method not only generates the relationships between documents and topics, but also integrates the contextual relationships among words. Experimental results indicate that document features generated by our hybrid method are useful to improve classification performance by consolidating both global and local relationships.
科研通智能强力驱动
Strongly Powered by AbleSci AI