估计员
极小极大
后悔
维数(图论)
数学优化
数学
Lasso(编程语言)
水准点(测量)
计算机科学
甲骨文公司
线性模型
算法
统计
组合数学
地理
万维网
软件工程
大地测量学
作者
Xue Wang,Mike Mingcheng Wei,Tao Yao
出处
期刊:Cornell University - arXiv
日期:2018-01-01
被引量:2
标识
DOI:10.48550/arxiv.1812.02962
摘要
We propose a minimax concave penalized multi-armed bandit algorithm under generalized linear model (G-MCP-Bandit) for a decision-maker facing high-dimensional data in an online learning and decision-making process. We demonstrate that the G-MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in the sample size dimension T , O(log T), and further attains a tight bound in the covariate dimension d, O(log d). In addition, we develop a linear approximation method, the 2-step weighted Lasso procedure, to identify the MCP estimator for the G-MCP-Bandit algorithm under non-iid samples. Under this procedure, the MCP estimator matches the oracle estimator with high probability and converges to the true parameters with the optimal convergence rate. Finally, through experiments based on synthetic data and two real datasets (warfarin dosing dataset and Tencent search advertising dataset), we show that the G-MCP-Bandit algorithm outperforms other benchmark algorithms, especially when there is a high level of data sparsity or the decision set is large.
科研通智能强力驱动
Strongly Powered by AbleSci AI