摘要
Emerging applications utilize numerous Deep Neural Networks (DNNs) to address multiple tasks simultaneously. As these applications continue to expand, there is a growing need for off-chip memory access optimization and innovative architectures that can adapt to diverse computation, memory, and communication requirements of various DNN models. To address these challenges, we propose Versa-DNN, a versatile DNN accelerator that can provide efficient computation, memory, and communication support for the simultaneous execution of multiple DNNs. Versa-DNN features three unique designs: a flexible off-chip memory access optimization strategy, adaptable communication fabrics, and a communication and computational aware scheduling algorithm. The proposed off-chip memory optimization strategy can improve performance and energy efficiency by increasing hardware utilization, eliminating excess data duplication, and reducing off-chip memory accesses. The adaptable communication fabrics consist of distributed buffers, processing elements, and a flexible Network-on-Chip (NoC), which can dynamically morph and fission to support distinct communication and computation needs for simultaneously running DNN models. Furthermore, the proposed scheduling policy manages the simultaneous execution of multiple DNN models with improved performance and energy efficiency. Simulation results using several DNN models, show that the proposed Versa-DNN architecture achieves 41%, 238%, 392% throughput speedup and 30%, 59%, 63% energy reduction on average for different workloads when compared to state-of-the-art accelerators such as Planaria, Herald, and AI-MT, respectively.