New cloud computing applications fall into two categories; scale-out workloads and latency-critical high-performance computing (HPC) applications. While both types have distinct computing characteristics, they also share certain common characteristics. Both require frequent access to data from memory to manage extremely large operating data sets that cannot fit on the on-chip caches. Additionally, both categories have quality-of-service (QoS) or latency requirements.
Attempts to increase computation power by integrating many-cores on single chip have hit the power-wall. Heterogeneous multi-core architectures have helped push back the power-wall and overcome the dark silicon problem, but only to a limited extent. Moreover, significant increase in computational performance and efficiency can only be achieved by overcoming the memory wall problem. Towards that objective, state-of-the-art emerging memory technologies, like 3D stacked die memories and Non-Volatile-Memory (NVM), are being explored as alternatives to traditional memories.
Based on the common characteristics shared by scale-out applications and HPC applications, we are exploring novel heterogeneous architectures that offer energy efficiency and high performance. We are working on heterogeneity at all levels of the computation, including micro-architectural heterogeneity (in-order core, out-of-order cores), ISA level heterogeneity (mix of x86, ARM and MIPS), functional heterogeneity comprising of a mix of CPU, GPU and accelerator cores, and a deeply heterogeneous architecture combining all the aforementioned heterogeneous levels.
We are also exploring heterogeneous memory architectures to attack the memory wall. This line of research attempts to integrate emerging memory technologies like 3D stacked DRAM like Hybrid Memory Cube (HMC) and NVMs with existing SRAM and DRAMS. Another area of our research focuses on processor-in-memory (PIM) and near-data-computing architectures in order to reduce the latency between compute and memory nodes.
The main challenges in designing heterogeneous memory architectures include memory management and programming issues. Heterogeneous computing architectures along with a heterogeneous memory architecture can help address those challenging by providing magnitude improvement in energy-efficiency and performance for emerging latency-critical applications.
To perform architectural exploration, we use simulation framework based on GEM5, which is a cycle accurate simulator that supports different ISAs (like x86, ARM, ALPHA, MIPS, POWER, SPARC) and has multiple CPU simulation models like the simple atomic model, in-order and out-of-order (OoO) CPU model. GEM5 supports multiple cache coherence protocols and interconnect models. It also has models for different memory devices including traditional memories (DDR) and emerging memories (HMC). GEM5 supports booting Linux in full system simulation mode for ARM, x86 and ALPHA ISAs.
To have a realistic view and confidence in our simulation framework, we validated ARM-based simulation in GEM5 against Cavium ThunderX, which is an ARM-based server.