后悔
上下界
计算机科学
数学优化
运筹学
产品(数学)
数学
机器学习
几何学
数学分析
作者
Rong Jin,David Simchi‐Levi,Li Wang,Xinshang Wang,Sen Yang
出处
期刊:Management Science
[Institute for Operations Research and the Management Sciences]
日期:2021-01-12
卷期号:67 (8): 4756-4771
被引量:5
标识
DOI:10.1287/mnsc.2020.3773
摘要
The recent rising popularity of ultrafast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for such online retailers: the number of stock keeping units (SKUs) they carry is no longer “the more, the better,” yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically selecting a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers’ ultrafast delivery platforms. We distill the product selection problem into a semibandit model with linear generalization. There are in total N arms corresponding to N products, each with a feature vector of dimension d. The player pulls K arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where K is much greater than the number of total time periods T or the dimension of product features d. We first analyze a standard Upper Confidence Bound (UCB) algorithm and show its regret bound can be expressed as the sum of a T-independent part and a T-dependent part, which we refer to as “fixed cost” and “variable cost,” respectively. To reduce the fixed cost for large K values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of d. Moreover, we test the algorithms on an industrial data set from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%. This paper was accepted by J. George Shanthikumar, big data analytics.
科研通智能强力驱动
Strongly Powered by AbleSci AI