Thompson Sampling for the Multinomial Logit Bandit

数学 多项式logistic回归 多项式分布 采样(信号处理) 统计 计量经济学 多项式概率 数学优化 计算机科学 滤波器(信号处理) 计算机视觉
作者
Shipra Agrawal,Vashist Avadhanula,Vineet Goyal,Assaf Zeevi
出处
期刊:Mathematics of Operations Research [Institute for Operations Research and the Management Sciences]
标识
DOI:10.1287/moor.2020.0096
摘要

We consider a dynamic combinatorial optimization problem where at each time step, the decision maker selects a subset of cardinality K from N possible items and observes a feedback in the form of the index of one of the items in the said subset or none. Each of the N items is ascribed a certain value (reward), which is collected if the item is chosen. This problem is motivated by that of assortment selection in online retail, where items are products. Akin to that literature, it is assumed that the choice of the item given the subset is governed by a multinomial logit (MNL) choice model whose parameters are a priori unknown. The objective of the decision maker is to maximize the expected cumulative rewards over a finite horizon T or alternatively, minimize the regret relative to an oracle that knows the MNL choice model parameters. We formulate this problem as a multiarmed bandit problem that we refer to as the MNL-bandit problem. We present a Thompson sampling-based algorithm for this problem and show that it achieves near-optimal regret as well as attractive empirical performance. Funding: S. Agrawal is supported in part by the Division of Civil, Mechanical and Manufacturing Innovation [NSF Grant 1846792]. V. Goyal is supported in part by the Division of Civil, Mechanical and Manufacturing Innovation [NSF Grants 1351838 and 1636046]. A. Zeevi is supported in part by the Division of Computer and Network Systems [NSF Grant 0964170] and the United States-Israel Binational Science Foundation [Grant 2010466].

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
osh111发布了新的文献求助10
1秒前
1秒前
朱朱猪猪完成签到,获得积分10
1秒前
2秒前
yyydd发布了新的文献求助10
2秒前
完美问玉完成签到,获得积分10
2秒前
2秒前
6秒前
孔雀翎发布了新的文献求助10
8秒前
向着阳光奔跑完成签到,获得积分20
9秒前
9秒前
从容的无极应助蓝荆采纳,获得10
10秒前
负责凛完成签到,获得积分10
10秒前
空谷新苗发布了新的文献求助10
11秒前
11秒前
叶长亭发布了新的文献求助20
11秒前
火星上访旋完成签到,获得积分10
11秒前
SciGPT应助淡定靖儿采纳,获得10
12秒前
北方集群完成签到,获得积分10
12秒前
FashionBoy应助仚屳采纳,获得10
13秒前
思源应助含蓄清炎采纳,获得10
14秒前
liziqqq完成签到,获得积分10
14秒前
15秒前
烟花应助现代秦始皇采纳,获得10
16秒前
boltos发布了新的文献求助10
18秒前
晨晨学长发布了新的文献求助10
18秒前
无花果应助如意翡翠采纳,获得10
19秒前
19秒前
pp1230发布了新的文献求助10
19秒前
芒果椰奶冻完成签到,获得积分10
20秒前
20秒前
大昕完成签到,获得积分10
20秒前
20秒前
22秒前
蝉一个夏天完成签到,获得积分10
22秒前
gao完成签到 ,获得积分20
23秒前
田様应助VDC采纳,获得10
23秒前
23秒前
大昕发布了新的文献求助10
24秒前
24秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Mechanistic Modeling of Gas-Liquid Two-Phase Flow in Pipes 2500
Kelsen’s Legacy: Legal Normativity, International Law and Democracy 1000
Conference Record, IAS Annual Meeting 1977 610
Interest Rate Modeling. Volume 3: Products and Risk Management 600
Interest Rate Modeling. Volume 2: Term Structure Models 600
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3542916
求助须知:如何正确求助?哪些是违规求助? 3120308
关于积分的说明 9342102
捐赠科研通 2818290
什么是DOI,文献DOI怎么找? 1549524
邀请新用户注册赠送积分活动 722160
科研通“疑难数据库(出版商)”最低求助积分说明 712978