Prof. Prashant Nair: Scaling the Memory Wall
BC420 - Computing Building of EPFL EPFL, EcublensTowards 3D-DRAM-based Accelerators for Efficient Generative Inference Generative AI now underpins search, digital assistants, and media applications, making inference cost a first-order design constraint. Unlike traditional compute-bound workloads, large language and speech models are typically limited by memory bandwidth and capacity rather than raw arithmetic throughput. Thus, their inference cost […]