This event has passed.

Prof. Prashant Nair: Scaling the Memory Wall

Name: Prof. Prashant Nair: Scaling the Memory Wall
Start: 2026-04-01T12:15:00+02:00
End: 2026-04-01T13:15:00+02:00
Location: BC420 – Computing Building of EPFL

April 1st, 2026, 12:15 - 13:15

Towards 3D-DRAM-based Accelerators for Efficient Generative Inference

Generative AI now underpins search, digital assistants, and media applications, making inference cost a first-order design constraint. Unlike traditional compute-bound workloads, large language and speech models are typically limited by memory bandwidth and capacity rather than raw arithmetic throughput. Thus, their inference cost is driven as much by data movement as by compute, and therefore hinges on the memory system’s design. This concern is especially acute during autoregressive decoding, which must repeatedly stream model weights and key–value (KV) caches at high bandwidths and low latencies while also providing enough capacity to support long context windows and several concurrent users. To make matters worse, these demands are accelerating with state-of-the-art models now exceeding hundreds of billions of parameters, context windows expanding from 4K to 128K tokens and beyond, and mixture-of-experts designs introducing additional irregularity in memory access patterns. Thus, today’s memory technologies force difficult trade-offs. SRAM can deliver extremely high bandwidth, but at prohibitive area and capacity limits. HBM offers higher capacity, but remains constrained by achievable bandwidth and I/O power. Closing this gap will require a fundamental rethinking of how memory is integrated with accelerator logic.

In this talk, I will introduce our upcoming memory-centric accelerator, which vertically integrates logic with 3D-stacked DRAM to deliver SRAM-level bandwidth and HBM-class capacity while substantially reducing energy consumption. I will describe the architectural challenges addressed by workload-aware channel mapping, optimized power management, topology-preserving redundancy, and thermal-aware reliability mechanisms, enabling the practical deployment of 3D-DRAM. Evaluations using models such as Llama-3.1, DeepSeek-V3, Canary, and Whisper show that our accelerator achieves significantly higher throughput and responsiveness compared to HBM-based alternatives. I will conclude by examining the broader implications for computer architecture, particularly how advanced logic-memory integration through hybrid bonding and multi-high stacking can reshape inference cost structures and enable the next generation of trillion-parameter models.

Biography: Prashant J. Nair is the lead architect of the 3D-memory architecture at d-Matrix for their upcoming accelerators. He is also an Associate Professor at the University of British Columbia (UBC), where he leads the Systems and Architectures (STAR) Lab, and an Affiliate Fellow of the Quantum Algorithms Institute. His research focuses on memory architectures and systems. Dr. Nair’s recognitions include the 2024 TCCA Young Architect Award (the highest early-career honor in computer architecture), the 2025 DSN Test of Time Award, the HPCA 2023 Best Paper Award, a MICRO 2024 Best Paper nomination, and the HPCA 2025 Distinguished Artifact Award. Over the past decade, he has published more than 40 papers in top-tier venues. Prior to his promotion to Associate Professor, as an Assistant Professor, he was inducted into all three halls of fame of computer architecture: ISCA, MICRO, and HPCA.

Website: https://prashantnair.bitbucket.io/

Most Recent Co-Lead Project: https://gimletlabs.ai/blog/low-latency-spec-decode-corsair

Details

Date: April 1st, 2026
Time:
12:15 - 13:15
Event Category: EcoCloud Official Event

Venue

BC420 – Computing Building of EPFL
EPFL
Ecublens, Switzerland + Google Map
View Venue Website

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others

Upcoming Events

SwissChips Annual Event

Prof. Prashant Nair: Scaling the Memory Wall

Towards 3D-DRAM-based Accelerators for Efficient Generative Inference

Details

Venue