聚类分析
计算机科学
贝叶斯概率
嵌套集模型
树(集合论)
数据挖掘
统计
数学
人工智能
组合数学
关系数据库
作者
Yinqiao Yan,Xiangyu Luo
标识
DOI:10.6084/m9.figshare.21263192.v1
摘要
Data integration plays a crucial role in the era of big data. The nested data are a combined set of observations from multiple sources and exhibit heterogeneity both at the source level and at the observational level. The complex nature makes it challenging to reasonably visualize and jointly analyze the nested data. In this paper, we present a nonparametric Bayesian model to implement the tree-structured two-level clustering for nested data analysis. The two-level clustering is used to tease out the heterogeneity existing in the sources and observations, while a tree-structured prior is employed to model the latent hierarchy for clusters at the observational level. The proposed Bayesian model is flexible as it does not require an exact specification of cluster numbers or tree width/depth, and it can automatically learn the underlying tree structures among clusters of observations, thus offering an insightful visualization of the nested data. We further provide a rigorous posterior sampling scheme via the partially collapsed Gibbs sampler and show the performance of the proposed method using simulation studies. Finally, the applications to two different types of nested data (multi-source image data and multi-subject single-cell expression data) demonstrate the advantages of the proposed Bayesian method.
科研通智能强力驱动
Strongly Powered by AbleSci AI