计算机科学
推论
延迟(音频)
分布式计算
分拆(数论)
GSM演进的增强数据速率
边缘设备
云计算
移动设备
背景(考古学)
近似推理
人工智能
电信
古生物学
数学
组合数学
生物
操作系统
作者
Run Yang,Yan Li,Hui He,Weizhe Zhang
标识
DOI:10.1109/ijcnn55064.2022.9892582
摘要
The collaborative inference approach splits the Deep Neural Networks (DNNs) model into two parts. It runs collaboratively on the end device and cloud server to minimize inference latency and protect data privacy, especially in the 5G era. The scheme of DNN model partitioning depends on the network bandwidth size. However, in the context of dynamic mobile networks, resource-constrained devices cannot efficiently execute complex model partitioning algorithms to obtain optimal partitioning in real-time. In this paper, to overcome this challenge, we first formulate the model partitioning problem as a Min-cut problem to seek the optimal partition. Second, we propose a Collaborative Inference method based on model Compression named CIC. CIC enhances the efficiency of the execution of model partitioning algorithms on resource-constrained end devices by reducing the algorithm's complexity. CIC generates a splitting model based on the inherent characteristics of the DNN model and the platform resources. The splitting models are independent of the network environment, generated offline, and constantly used in the current environment. CIC has excellent compressibility, and even DNN models with hundreds of layers can be rapidly partitioned on resource-constrained devices. Experimental results show that our method is significantly more effective than existing solutions, speeding up model partitioning decision time by up to 100x, reducing inference latency by up to 2.6x, and increasing throughput by up to 3.3x in the best case.
科研通智能强力驱动
Strongly Powered by AbleSci AI