计算机科学
多核处理器
现场可编程门阵列
闲置
高效能源利用
调度(生产过程)
并行计算
水准点(测量)
嵌入式系统
移植
杠杆(统计)
工作量
对称多处理机系统
分布式计算
操作系统
软件
工程类
经济
机器学习
电气工程
地理
运营管理
大地测量学
作者
Andrés Rodríguez,Ángeles Navarro,Rafael Asenjo,Francisco Corbera,Rubén Gran Tejero,Darío Suárez Gracia,Jose Nunez‐Yanez
标识
DOI:10.1016/j.sysarc.2019.06.006
摘要
This paper presents a framework targeted to low-cost and low-power heterogeneous MultiProcessors that exploits FPGAs and multicore CPUs, with the overarching goal of providing developers with a productive programming model and runtime support to fully use all the processing resources available. FPGA productivity is achieved using a high-level programming model based on OpenCL, the standard for cross-platform parallel heterogeneous programming. In this work, we focus on the parallel_for pattern, and as part of the runtime support for this pattern, we leverage a new scheduler that strives to maximize the number of iterations per joule by dynamically and adaptively partitioning the iteration space between the multicore and the accelerator when working simultaneously. A total of 7 benchmarks are ported and optimized for a low-cost DE1 board. The results show that the heterogeneous solution can improve performance up to 2.9 × and increases energy efficiency up to 2.7 × compared to the traditional approach of keeping all the CPU cores idle while the accelerator computes the workload. Our results also demonstrate two interesting insights: first, an adaptive scheduler able to find at runtime the right chunk size for each type of application and device configuration is an essential component for these kinds of heterogeneous platforms, and second, device configurations that provide higher throughput do not always achieve better energy efficiency when only the running power (excluding the idle power component) is considered.
科研通智能强力驱动
Strongly Powered by AbleSci AI