计算机科学
大数据
数据科学
可视化
数据可视化
主题模型
潜在Dirichlet分配
视觉分析
信息可视化
社会化媒体
情报检索
交互式可视化
数据建模
社交网络(社会语言学)
社会网络分析
作者
Nitin Sukhija,Mahidhar Tatineni,Nicole M. Brown,Mark Van Moer,Paul Rodriguez,Spencer Callicott
出处
期刊:Ubiquitous Intelligence and Computing
日期:2016-07-01
被引量:13
标识
DOI:10.1109/uic-atc-scalcom-cbdcom-iop-smartworld.2016.0183
摘要
Topic modeling is a widely used approach for analyzing large text collections. In particular, Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling approaches to aggregate vocabulary from a document corpus to form latent topics. However, learning meaningful topic models with massive document collections which contain millions of documents, billions of tokens is challenging, given the complexity of the data involved, the difficulty in distributing the computation across multiple computing nodes. In recent years some data processing frameworks, such as Spark, Mallet, others have been developed to address the issues associated with analyzing large volumes of unlabeled text pertaining to various domains in a scalable, efficient manner. In this paper, we will present a preliminary case study demonstrating the scholarship achieved in the study of political consumerism via XSEDE resources. The experimental study will showcase the use of digitized social sciences data, text analytics toolkits to generate topic models, visualize topics for empowering intersectional research engaging the relationship between consumption, race, class, gender in the area of sociology. Consequently, this comparative big data textual analysis involving use of JSTOR data, LDA modeling toolkit's, visualization techniques, computational components is of paramount importance, especially for researchers from academic domain dealing with social science applications involving big data.
科研通智能强力驱动
Strongly Powered by AbleSci AI