Prof. Prashant Nair: Scaling the Memory Wall

BC420 - Computing Building of EPFL EPFL, Ecublens

Towards 3D-DRAM-based Accelerators for Efficient Generative Inference Generative AI now underpins search, digital assistants, and media applications, making inference cost a first-order design constraint. Unlike traditional compute-bound workloads, large language and speech models are typically limited by memory bandwidth and capacity rather than raw arithmetic throughput. Thus, their inference cost […]