OfpCNN: On-Demand Fine-Grained Partitioning for CNN Inference Acceleration in Heterogeneous Devices

计算机科学 粒度 并行计算 中央处理器 推论 计算 分拆(数论) 卷积神经网络 延迟(音频) 分布式计算 计算机工程 算法 人工智能 计算机硬件 电信 数学 组合数学 操作系统
作者
Lei Yang,Can Zheng,Xiaoyuan Shen,Guoqi Xie
出处
期刊:IEEE Transactions on Parallel and Distributed Systems [Institute of Electrical and Electronics Engineers]
卷期号:34 (12): 3090-3103 被引量:3
标识
DOI:10.1109/tpds.2023.3321755
摘要

Collaborative inference is a promising method for balancing the limited computational power of Internet of Things (IoT) devices with the huge computational demands of convolutional neural networks (CNNs). In this approach, a CNN is divided into multiple partitions and placed on multiple devices to run simultaneously. However, two major challenges are raised. (1) Computational latencies vary when the central processing unit (CPU) loads of devices are different. However, no suitable methods are available for accurately determining computation latencies on the basis of CPU utilization. (2) Existing methods partition a CNN model either vertically or horizontally. The granularity of these methods is extremely coarse and their accuracy is low. To address the aforementioned issues, this study proposes a distributed collaborative inference framework that supports a fine-grained partitioning scheme for CNN in heterogeneous devices (hereafter referred to as OfpCNN). First, the framework uses the layer latency prediction model based on floating-point operations and CPU load (FCPM) to accurately predict the computation latency of each layer of CNN in different devices. Subsequently, OfpCNN uses horizontal and vertical partitioning methods (HVPM) to partition the input feature maps and the structure of CNN respectively in accordance with network conditions and computing capacity, then assigns them to multiple devices for execution. The HVPM solution overall considers the execution position of the layer, parallelism, and location of devices responsible for data aggregation and distribution, which can consequently obtain more fine-grained partition schemes. Experimental results show that FCPM can achieve a minimum accuracy of 88% and HVPM can improve the inference speed by 1–2.54 times compared with other state-of-the-art methods.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
xdf完成签到,获得积分10
1秒前
SONG完成签到,获得积分10
1秒前
2秒前
CaliU完成签到,获得积分10
2秒前
夏天来了发布了新的文献求助10
4秒前
5秒前
小闲鱼完成签到,获得积分10
5秒前
6秒前
牛太虚完成签到,获得积分10
6秒前
合适书芹完成签到,获得积分10
7秒前
缥缈冰珍完成签到 ,获得积分20
7秒前
8秒前
张先生2365完成签到,获得积分10
8秒前
hbb完成签到 ,获得积分10
8秒前
Ding完成签到,获得积分10
8秒前
AZE完成签到,获得积分10
9秒前
要减肥的卷心菜完成签到,获得积分10
9秒前
9秒前
10秒前
夏天来了发布了新的文献求助10
10秒前
liang完成签到,获得积分10
10秒前
陈艳林完成签到,获得积分10
11秒前
cc完成签到 ,获得积分10
11秒前
方伟达完成签到,获得积分10
11秒前
ztt完成签到,获得积分10
12秒前
zywii完成签到,获得积分10
12秒前
超级的千青完成签到 ,获得积分10
13秒前
梦想完成签到,获得积分10
13秒前
xiaxiao完成签到,获得积分0
13秒前
LFY发布了新的文献求助10
13秒前
施耐德发布了新的文献求助10
13秒前
传奇3应助无香采纳,获得10
13秒前
wbn1212完成签到,获得积分20
13秒前
Frank完成签到,获得积分10
14秒前
无限以寒完成签到,获得积分10
14秒前
李爱国应助拉拉霍霍采纳,获得10
14秒前
下雨了发布了新的文献求助10
15秒前
派大星完成签到,获得积分10
15秒前
三三发布了新的文献求助10
15秒前
lw完成签到,获得积分10
16秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Mechanistic Modeling of Gas-Liquid Two-Phase Flow in Pipes 2500
Structural Load Modelling and Combination for Performance and Safety Evaluation 800
Conference Record, IAS Annual Meeting 1977 610
Virulence Mechanisms of Plant-Pathogenic Bacteria 500
白土三平研究 500
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3556011
求助须知:如何正确求助?哪些是违规求助? 3131566
关于积分的说明 9392042
捐赠科研通 2831431
什么是DOI,文献DOI怎么找? 1556440
邀请新用户注册赠送积分活动 726584
科研通“疑难数据库(出版商)”最低求助积分说明 715910