亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Maximal Objectives in the Multiarmed Bandit with Applications

计算机科学 经济 风险分析(工程) 业务
作者
Eren Özbay,Vijay Kamble
出处
期刊:Management Science [Institute for Operations Research and the Management Sciences]
卷期号:70 (12): 8853-8874
标识
DOI:10.1287/mnsc.2022.00801
摘要

In several applications of the stochastic multiarmed bandit problem, the traditional objective of maximizing the expected total reward can be inappropriate. In this paper, we study a new objective in the classic setup. Given K arms, instead of maximizing the expected total reward from T pulls (the traditional “sum” objective), we consider the vector of total rewards earned from each of the K arms at the end of T pulls and aim to maximize the expected highest total reward across arms (the “max” objective). For this objective, we show that any policy must incur an instance-dependent asymptotic regret of [Formula: see text] (with a higher instance-dependent constant compared with the traditional objective) and a worst case regret of [Formula: see text]. We then design an adaptive explore-then-commit policy featuring exploration based on appropriately tuned confidence bounds on the mean reward and an adaptive stopping criterion, which adapts to the problem difficulty and simultaneously achieves these bounds (up to logarithmic factors). We then generalize our algorithmic insights to the problem of maximizing the expected value of the average total reward of the top m arms with the highest total rewards. Our numerical experiments demonstrate the efficacy of our policies compared with several natural alternatives in practical parameter regimes. We discuss applications of these new objectives to the problem of conditioning an adequate supply of value-providing market entities (workers/sellers/service providers) in online platforms and marketplaces. This paper was accepted by Vivek Farias, data science. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.00801 .

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
8秒前
南陆赏降英完成签到,获得积分10
8秒前
思源应助Mocca采纳,获得10
10秒前
爆米花应助lllll采纳,获得10
13秒前
yy发布了新的文献求助10
15秒前
善学以致用应助等风来LYY采纳,获得30
16秒前
19秒前
19秒前
19秒前
乐乐应助科研通管家采纳,获得10
20秒前
愔愔应助科研通管家采纳,获得10
20秒前
冯冯冯完成签到,获得积分20
21秒前
22秒前
lllll发布了新的文献求助10
24秒前
25秒前
Mocca发布了新的文献求助10
27秒前
科研通AI6.3应助yy采纳,获得10
37秒前
自由滑大王完成签到 ,获得积分10
42秒前
yuyuyu完成签到 ,获得积分10
45秒前
50秒前
yuki发布了新的文献求助10
56秒前
58秒前
TAT完成签到,获得积分10
59秒前
cen发布了新的文献求助10
1分钟前
脑洞疼应助TAT采纳,获得10
1分钟前
山令发布了新的文献求助10
1分钟前
狂野的含烟完成签到 ,获得积分10
1分钟前
1分钟前
量子星尘发布了新的文献求助10
1分钟前
Jasper应助山令采纳,获得10
1分钟前
山令完成签到,获得积分10
1分钟前
可可派完成签到,获得积分10
1分钟前
李健应助NIE采纳,获得10
1分钟前
1分钟前
1分钟前
1分钟前
等风来LYY发布了新的文献求助30
1分钟前
火山蜗牛发布了新的文献求助10
1分钟前
19900420完成签到 ,获得积分10
1分钟前
爆米花应助YEM采纳,获得10
1分钟前
高分求助中
Entre Praga y Madrid: los contactos checoslovaco-españoles (1948-1977) 1000
Polymorphism and polytypism in crystals 1000
Signals, Systems, and Signal Processing 610
Discrete-Time Signals and Systems 610
Horngren's Cost Accounting A Managerial Emphasis 17th edition 600
Tactics in Contemporary Drug Design 500
Russian Politics Today: Stability and Fragility (2nd Edition) 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 纳米技术 有机化学 物理 生物化学 化学工程 计算机科学 复合材料 内科学 催化作用 光电子学 物理化学 电极 冶金 遗传学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 6086547
求助须知:如何正确求助?哪些是违规求助? 7916229
关于积分的说明 16376864
捐赠科研通 5220013
什么是DOI,文献DOI怎么找? 2790822
邀请新用户注册赠送积分活动 1773973
关于科研通互助平台的介绍 1649615