过度拟合
块(置换群论)
计算机科学
随机块体模型
交叉验证
概率逻辑
估计
数据挖掘
机器学习
算法
出处
期刊:Stat
[Wiley]
日期:2021-09-30
摘要
The stochastic block model (SBM) and its variants constitute an important family of probabilistic tools for studying network data. There is a rich literature on methods for estimating block labels and model parameters of stochastic block models. Most of these studies require the number of communities K as an input, making the estimation of K an important problem. Cross-validation is a natural option for this problem since it is a widely used generic method for evaluating model fitting. However, cross-validation is known to be inconsistent and prone to overfitting unless impractical split ratios are used. Cross-validation with confidence (CVC) is proposed with better theoretical guarantees in conventional settings. We study the properties of CVC for stochastic block models. Our theoretical studies show that CVC, unlike the standard cross-validation, can consistently pick the optimal K under suitable conditions. We implement this method and check its performance against other established methods on both synthetic and real datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI