EcoCloud faculty members and industrial partners have been showcased in a volume of retrospectives on research into computer architecture.

The International Symposium on Computer Architecture is the premier forum for new ideas and experimental results in computer architecture, with this year marking the Symposium’s 50th anniversary.

As part of the celebrations, significant and memorable papers from 1996 to 2020 were selected for a 50th anniversary volume of articles, telling a story of how research at ISCA has progressed over those twenty-five years. Each article was accompanied by a retrospective from the authors.

Published in 2015, ShiDianNao: shifting vision processing closer to the sensor, pioneered AI acceleration at the edge. Co-authored by Paolo Ienne, head of the Processor Architecture Laboratory, the paper focuses on image applications, arguably the most important category among recognition and mining applications, and Convolutional Neural Networks which are state-of-the-art for these applications. The authors propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The removal of all DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows them to design an accelerator which is 60 times more energy efficient than the previous state-of-the-art neural network accelerator.

A reconfigurable fabric for accelerating large-scale datacenter services, known as the Catapult paper (over 1400 citations), suggests a pioneering use of FPGAs to accelerate computation in the cloud. Co-authored by EPFL James Larus, head of the Very Large Scale Computing Laboratory, it describes the medium-scale deployment of a composable, reconfigurable fabric to accelerate portions of the Microsoft Bing search engine that improved the computational capabilities, flexibility, power efficiency, and cost of this fundamental service.

Scale-out processors introduces the first generation of cloud-native CPUs, and is authored by Babak Falsafi and members of his Parallel Systems Architecture Lab. Based on a metric of performance-density, the authors present a framework to optimize the amount silicon required in a server CPU to support software services in the datacenter while maximizing throughput and maintaining end-to-end service latency guarantees. This research created the blueprint for an ARM-based server CPU, Cavium ThunderX.

Also included in the volume of retrospectives, co-authored by Babak Falsafi, is Dead-block prediction & dead-block correlating prefetchers, a paper published in 2001 that targeted bridging the growing performance gap between processors and memory (known at the time as the "Memory Wall"). It proposes CPU predictors that accurately identify ‘when’ a memory block on-chip becomes evictable and ‘what' subsequent memory block to prefetch and hide long memory access latency. The lasting impact of this work has been to inspire memory streaming technologies called temporal streaming that have appeared in products including IBM BlueGene/Q and ARM Neoverse N2.

Memory system characterization of commercial workloads, published in 1998, led the community to revisit server design. Co-authored by Professor Edouard Bugnion, who leads the Data Center Systems Laboratory, it presented a detailed performance study of three important classes of commercial workloads: online transaction processing, decision support systems and Web index search, identifying a set of simplifications that make these workloads more amenable to monitoring and simulation without affecting representative memory system behavior.

Kim Hazelwood, a member of the EcoCloud Industrial Affiliates Program Board, is listed as a co-author of Profiling a warehouse-scale computer. This 2015 paper explores microarchitectural optimizations for server processors, especially in the cache hierarchy. It explores several interesting directions for future warehouse-scale computers.

Doug Burger of Microsoft, also a member of the EcoCloud IAP Board, appears in seven (!) of the ninety-eight papers selected, confirming his reputation as one of the world’s leading active researchers in computer architecture:
A configurable cloud-scale DNN processor for real-time AI (2018)
A reconfigurable fabric for accelerating large-scale datacenter services (2014) (op. cit.)
General-purpose code acceleration with limited-precision analog computation (2014)
Dark silicon and the end of multicore scaling (2011) (>2400 citations)
Use ECP, not ECC, for hard failures in resistive memories (2010)
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture (2003)
Memory Bandwidth Limitations of Future Microprocessors (1996)

The final number of papers selected was more than twice than in the Symposium’s previous retrospective, reflecting the growth of the computer architecture research community in the past 25 years. Close to 5% of this computer architecture research is related to EPFL. Organizers hope this retrospective volume is an exciting read for older and younger generations of computer architects.

 

  • Author: Tanya Petersen
  • Image: © iStock / EPFL 2023