• Prof. Prashant Nair: Scaling the Memory Wall

    BC420 - Computing Building of EPFL EPFL, Ecublens, Switzerland

    Towards 3D-DRAM-based Accelerators for Efficient Generative Inference Generative AI now underpins search, digital assistants, and media applications, making inference cost a first-order design constraint. Unlike traditional compute-bound workloads, large language and speech models are typically limited by memory bandwidth and capacity rather than raw arithmetic throughput. Thus, their inference cost […]