计算机科学
聚类分析
人工智能
主题模型
文档聚类
人工神经网络
公制(单位)
相似性(几何)
机器学习
概率逻辑
无监督学习
余弦相似度
数据挖掘
可扩展性
数据库
图像(数学)
经济
运营管理
作者
Sandhya Subramani,Vaishnavi Sridhar,Kaushal Shetty
标识
DOI:10.1109/ssci.2018.8628912
摘要
Topic modelling is a text mining technique to discover common topics in a collection of documents. The proposed methodology of topic modelling used artificial neural networks to improve the clustering mechanism of similar documents by modelling probabilistic relations between the topics, documents and vocabulary. Currently, while topic modelling and clustering are considered to be manifestations of unsupervised learning, and neural networks on the other hand are used for supervised learning problems, Neural Topic Modelling reformulated topic modelling into a supervised learning task by defining an objective function whose loss function had to be minimized. Custom input embedding layers were designed in order to extract the semantic relationships between the words in the corpus, and the output of the model presented a topic probability distribution for each document. The documents with similar distributions were then bucketed together based on the criteria of meeting the threshold value of a simple distance based similarity metric, such as cosine similarity. The model was implemented using Keras with TensorFlow backend and the effectiveness of the clustering was validated on the IMDB Movie dataset and the News Aggregator dataset from UCI. On comparison with other commonly used clustering mechanisms in combination with traditional topic models, the proposed model delivered an improved Silhouette Co-efficient Score and Davies-Bouldin Index, along with an increased data handling capacity, thereby making the solution scalable.
科研通智能强力驱动
Strongly Powered by AbleSci AI