Distributed Training for Deep Learning Models On An Edge Computing Network Using Shielded Reinforcement Learning

计算机科学 强化学习 地铁列车时刻表 分布式计算 节点(物理) 边缘设备 仿真 边缘计算 调度(生产过程) 瓶颈 GSM演进的增强数据速率 计算机网络 人工智能 云计算 数学优化 工程类 嵌入式系统 操作系统 经济 结构工程 经济增长 数学
作者
Tanmoy Sen,Haiying Shen
标识
DOI:10.1109/icdcs54860.2022.00062
摘要

With the emergence of edge devices along with their local computation advantage over the cloud, distributed deep learning (DL) training on edge nodes becomes promising. In such a method, the cluster head of a cluster of edge nodes schedules all the DL training jobs from the cluster nodes. Using such a centralized scheduling method, the cluster head knows all the loads of the cluster nodes, which can avoid overloading the cluster nodes, but the head itself may become overloaded. To handle this problem, we first propose a multi-agent RL (MARL) system that enables each edge node to schedule its own jobs using RL. However, without the coordination between the nodes, action collision may occur, in which multiple nodes may schedule tasks to the same node and make it overloaded. To avoid these problems, we propose a system called Shielded ReinfOrcement learning (RL) based DL training on Edges (SROLE). In SROLE, each edge node schedules its own jobs using multi-agent RL. The shield deployed in a node checks action collisions and provides alternative actions to avoid the collisions. As the central shield node for the entire cluster may become a bottleneck, we further propose a decentralized shielding method, in which different shields are responsible for different regions in the cluster and they coordinate to avoid action collisions on the region boundaries. Our container-based emulation experiments show that SROLE reduces training time by up to 59% with 29% lower median resource utilization and reduces the number of action collisions by up to 48% compared to multi-agent RL and the centralized RL. Our real device experiments show that SROLE still reduces the training time by up to 53% with 28% lower median resource utilization than multi-agent RL and the centralized RL.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
所所应助小木采纳,获得10
刚刚
1秒前
2秒前
ding应助lln90采纳,获得10
2秒前
小胡同学完成签到,获得积分10
2秒前
123发布了新的文献求助10
3秒前
彭于晏应助一颗橘子采纳,获得10
3秒前
慕青应助Singularity采纳,获得10
4秒前
WAGKY完成签到,获得积分10
4秒前
o海边风o完成签到,获得积分20
4秒前
凝芙完成签到,获得积分10
4秒前
hsa_ID发布了新的文献求助10
4秒前
JamesPei应助文文文采纳,获得10
4秒前
威廉兰尼斯特完成签到,获得积分10
4秒前
zll发布了新的文献求助10
5秒前
5秒前
幸运的科研小狗完成签到,获得积分10
5秒前
5秒前
6秒前
一只鲨呱发布了新的文献求助20
6秒前
吼吼哈哈完成签到,获得积分10
6秒前
Maestro_S发布了新的文献求助10
6秒前
酒酿梅子完成签到,获得积分10
7秒前
7秒前
8秒前
8秒前
8秒前
Maestro_S发布了新的文献求助10
10秒前
SciGPT应助科研通管家采纳,获得10
10秒前
乐乐应助科研通管家采纳,获得10
10秒前
10秒前
大模型应助科研通管家采纳,获得10
10秒前
星辰大海应助科研通管家采纳,获得10
10秒前
我是老大应助关美人儿采纳,获得20
10秒前
脑洞疼应助科研通管家采纳,获得10
10秒前
英姑应助科研通管家采纳,获得10
10秒前
kingwill应助科研通管家采纳,获得20
10秒前
李健应助科研通管家采纳,获得10
10秒前
pterionGao完成签到 ,获得积分10
10秒前
10秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Burger's Medicinal Chemistry, Drug Discovery and Development, Volumes 1 - 8, 8 Volume Set, 8th Edition 1800
Cronologia da história de Macau 1600
Netter collection Volume 9 Part I upper digestive tract及Part III Liver Biliary Pancreas 3rd 2024 的超高清PDF,大小约几百兆,不是几十兆版本的 1050
Current concept for improving treatment of prostate cancer based on combination of LH-RH agonists with other agents 1000
Research Handbook on the Law of the Sea 1000
Contemporary Debates in Epistemology (3rd Edition) 1000
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 计算机科学 化学工程 生物化学 物理 复合材料 内科学 催化作用 物理化学 光电子学 细胞生物学 基因 电极 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6168730
求助须知:如何正确求助?哪些是违规求助? 7996426
关于积分的说明 16630766
捐赠科研通 5273979
什么是DOI,文献DOI怎么找? 2813579
邀请新用户注册赠送积分活动 1793314
关于科研通互助平台的介绍 1659250