Concurrent Order Dispatch for Instant Delivery with Time-Constrained Actor-Critic Reinforcement Learning

强化学习 计算机科学 经济调度 即时 马尔可夫决策过程 订单(交换) 嵌入 实时计算 运筹学 数学优化 分布式计算 马尔可夫过程 人工智能 工程类 功率(物理) 财务 物理 统计 经济 电力系统 量子力学 数学
作者
Baoshen Guo,Shuai Wang,Yi Ding,Guang Wang,Suining He,Desheng Zhang,Tian He
标识
DOI:10.1109/rtss52674.2021.00026
摘要

Instant delivery has developed rapidly in recent years and significantly changed the lifestyle of people due to its timeliness and convenience. In instant delivery, the order dispatch process is concurrent. Couriers take new orders continuously and deliver multiple orders in a delivery trip (i.e., a batch). The delivery time of orders in a batch is often overlapped and interlinked with each other. The pickup and delivery sequence of the existing orders in a batch changes dynamically due to time constraints and real-time overdue possibility (i.e., the rate of deliveries that are not finished in promised time). Most of existing order dispatch mechanisms are designed for independent order dispatch or concurrent delivery without strict time constraints, hence are incapable of handling real-time concurrent dispatch with strict time constraints in on-demand instant delivery. To address the challenge, we propose a Time-Constrained Actor- Critic Reinforcement learning based concurrent dispatch system called TCAC-Dispatch to enhance the long-term overall revenue and reduce the overdue rate. Specifically, we design a deep matching network (DMN) with a variable action space, which integrates the state embedding (including route behaviors encoding) and actions embedding features into a long-term matching value. Then the Actor-Critic model tackles the concurrent order dispatch problem considering strict time constraints and stochastic demand-supply in instant delivery. An estimated time-based action pruning module is designed to ensure time constraints guarantee and accelerate the training as well as dispatching processes. We evaluate the TCAC-Dispatch with one-month data involved with 36.48 million orders and 42,000 couriers collected from one of the largest instant delivery companies in China, i.e., Eleme. Empirical experiments are conducted on a data-driven emulator deployed on the development environment of Eleme and results show that our method achieves 22% of the increase in total revenue and reduces the overdue rate by 21.6%.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
lml发布了新的文献求助20
1秒前
2秒前
thunder完成签到,获得积分10
2秒前
2秒前
ZhangZaikuan发布了新的文献求助20
3秒前
米九完成签到,获得积分10
3秒前
小菜发布了新的文献求助100
3秒前
3秒前
4秒前
汉堡包应助小付采纳,获得10
4秒前
5秒前
科研通AI2S应助happy8le采纳,获得10
5秒前
5秒前
shhs发布了新的文献求助10
6秒前
lvvvvvv发布了新的文献求助10
6秒前
wangqing发布了新的文献求助20
7秒前
7秒前
7秒前
8秒前
8秒前
无花果应助沉静的绿柏采纳,获得10
8秒前
脸小呆呆发布了新的文献求助10
8秒前
9秒前
日上三竿我独眠完成签到,获得积分10
9秒前
9秒前
传奇3应助HS采纳,获得10
9秒前
老叶完成签到,获得积分10
10秒前
贪玩毛巾发布了新的文献求助10
10秒前
村上种树完成签到,获得积分10
11秒前
12秒前
英俊延恶发布了新的文献求助10
12秒前
hahaha0102发布了新的文献求助10
13秒前
Mn发布了新的文献求助30
13秒前
14秒前
JJ的奇妙冒险完成签到,获得积分10
15秒前
16秒前
风趣寻芹发布了新的文献求助10
16秒前
科目三应助nicheng采纳,获得10
17秒前
17秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Kelsen’s Legacy: Legal Normativity, International Law and Democracy 1000
Interest Rate Modeling. Volume 3: Products and Risk Management 600
Interest Rate Modeling. Volume 2: Term Structure Models 600
Dynamika przenośników łańcuchowych 600
Recent progress and new developments in post-combustion carbon-capture technology with reactive solvents 600
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3538721
求助须知:如何正确求助?哪些是违规求助? 3116413
关于积分的说明 9325163
捐赠科研通 2814274
什么是DOI,文献DOI怎么找? 1546563
邀请新用户注册赠送积分活动 720607
科研通“疑难数据库(出版商)”最低求助积分说明 712086