New designs for efficient in-memory key-value stores


  Zwaenepoel Willy


A key-value store is an excellent form of database to store and retrieve information, especially for websites that see high traffic and are based on high-performance content. They are capable of retrieving values quickly because they have the most straightforward structure among all NoSQL databases. While a traditional RDBMS needs to handle complex data relationships, a key-value store only needs to store and retrieve values linked to a key. However, scholars are engaged in various researches to design memory-efficient and high-performance key-value stores. That is one of the core areas of research at EPFL’s Operating Systems Laboratory (LABOS), led by the work of Professor Willy Zwaenepoel and his team of research scientists.

The team is working on the development of new designs for efficient in-memory key-value stores. In existing designs, there is an adverse impact on throughput because of the highly demanding I/O operations. In contrast, the researchers at LABOS are driving efficiency of key-value stores by reducing the frequency and cost incurred by the I/O operations.

This year, the team has presented two new key-value stores: a causally consistent geo-replicated key-value store called Okapi and a persistent key-value store based on Log-Structured Merge trees called TRIAD.

Okapi is based on two design principles that contribute to its higher performance:

  • It is reliant on hybrid logical/physical clocks to achieve low latency
  • It increases resource efficiency and availability at the expense of a negligible increase in updating visibility latency.

The researchers tested Okapi with different workloads on Amazon AWS, which confirmed their finding that Okapi delivers better performance and has a low latency in comparison with some existing approaches to causal consistency.

TRIAD, on the other hand, leverages three techniques:

  • At the LSM memory component level, TRIAD uses skew in data popularity to avoid frequent I/O operations on the most popular keys.
  • At the storage level, TRIAD reduces management costs by deferring and batching multiple I/O operations.
  • At the commit log level, TRIAD circumvents duplicate writes to storage.

TRIAD was implemented as an extension of Facebook’s RocksDB. Its evaluation based on production and synthetic workloads confirmed improvement of throughput up to 193%, reduction of write amplification by a factor of up to 4x, and decrease in the amount of I/O by an order of magnitude.

Many data centers depend on in-memory key-value stores while web applications often use such software to cache the results of recurring computations. Therefore, the ongoing work at EPFL to reduce latency and improve the efficiency of key-value stores could have major ramifications on real-world applications.

Suggested readings