The era of AI computing brings significant challenges to traditional computer systems. As shown in Fig. 29.1.1, while the AI model computation requirement increases 750x every two years, we only observe a very slow-paced improvement of memory system capability in terms of both capacity and bandwidth. There are many memory-bound applications, such as natural language processing, recommendation systems, graph analytics, graph neural networks, as well as multi-task online inference, that become dominating AI applications in modern cloud datacenters. Current primary memory technologies that power AI systems and applications include on-chip memory (SRAM), 2.5D integrated memory (HBM [1]), and off-chip memory (DDR, LPDDR, or GDDR SDRAM). Although on-chip memory enjoys low energy access compared to off-chip memory, limited on-chip memory capacity prevents the efficient adoption of large AI models due to intensive and costly off-chip memory access. In addition, the energy consumption of data movement of off-chip memory solutions (HBM and DRAM) is several orders of magnitude larger than that of on-chip memory, bringing the well-known “memory wall [2]“problem to AI systems. Process-near-memory (PNM) and computing-in-memory (CIM) have become promising candidates to tackle the “memory wall” problem in recent years.