Dynamically Assembling DRAM Bursts over a Multitude of Random Accesses

Team

FPGAs implement massively parallel, application-specific compute engines. However, that approach fails when the application is memory bandwidth-bound. This is especially true for applications that perform irregular and narrow memory accesses directly on DRAM. Options for optimization are expensive in design time and hard to integrate with accelerators generated from high-level synthesis. Nonblocking caches are widely used on CPUs to reduce the negative impact of misses and thus increase performance of applications with low cache hit rate; however, they rely on associative lookup for handling multiple outstanding misses, which limits their scalability, especially on FPGAs. This results in frequent stalls whenever the application has a very low hit rate.

In this project, we show that by handling thousands of outstanding misses without stalling we can achieve a massive increase of memory-level parallelism, which can significantly speed up irregular memory-bound latency-insensitive applications. By storing miss information in cuckoo hash tables in block RAM instead of associative memory, we show how a nonblocking cache can be modified to support up to three orders of magnitude more misses. In addition, to further increase the available bandwidth on the memory side and unlike a traditional nonblocking cache, DynaBurst handles bursts of variable length on the memory side. When possible, we make bursts longer and exploit more of a DRAM row without being limited to the controller width. In other cases, when spatial locality is insufficient, we keep burst short and minimize contention in the controller.

Our research shows that DynaBurst provides new Pareto-optimal and Pareto-dominant design points in the area-delay space of throughput-oriented memory systems. Furthermore, supporting bursts is required for miss-optimized memory systems to be beneficial behind external memory interfaces with multiple narrow ports and can further boost read throughput when behind a single wide memory port.

Our memory system can be downloaded as an open-source project.

Suggested Readings:

Paper at 29th International Conference on Field Programmable Logic and Applications (FPL)

https://github.com/EPFL-LAP/fpl19-DynaBurst

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Upcoming Events

Future Health: Harnessing Multimodal Data and GenAI for Health Promotion

Swiss Federal Offices Day 2024

Annual Event

DynaBurst

Dynamically Assembling DRAM Bursts over a Multitude of Random Accesses

Team