马尔科夫蒙特卡洛
数学
分段
混合(物理)
近似贝叶斯计算
贝叶斯概率
算法
马尔可夫链
吉布斯抽样
计算
蒙特卡罗方法
采样(信号处理)
后验概率
计算机科学
数据挖掘
人工智能
统计
数学分析
物理
滤波器(信号处理)
量子力学
推论
计算机视觉
作者
Deborshee Sen,Matthias Sachs,Jianfeng Lu,David B. Dunson
出处
期刊:Biometrika
[Oxford University Press]
日期:2020-04-17
卷期号:107 (4): 1005-1012
被引量:9
标识
DOI:10.1093/biomet/asaa035
摘要
Classification with high-dimensional data is of widespread interest and often involves dealing with imbalanced data. Bayesian classification approaches are hampered by the fact that current Markov chain Monte Carlo algorithms for posterior computation become inefficient as the number [Formula: see text] of predictors or the number [Formula: see text] of subjects to classify gets large, because of the increasing computational time per step and worsening mixing rates. One strategy is to employ a gradient-based sampler to improve mixing while using data subsamples to reduce the per-step computational complexity. However, the usual subsampling breaks down when applied to imbalanced data. Instead, we generalize piecewise-deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch subsampling. These maintain the correct stationary distribution with arbitrarily small subsamples and substantially outperform current competitors. We provide theoretical support for the proposed approach and demonstrate its performance gains in simulated data examples and an application to cancer data.
科研通智能强力驱动
Strongly Powered by AbleSci AI