New and effective strategies for software and hardware acceleration


Team

  Stojilovic Mirjana


 

Large data centres struggle to support workloads such as heavy data analytics, deep neural networks, and 4k video streaming. With continuous performance enhancement challenged by the slowing down of transistor scaling, it is critical to look for alternatives. In this context, field programmable gate arrays (FPGAs) are gaining traction. FPGAs work well in modern circuit designs because of their non-recurring costs, in-field reprogrammability, and short turnaround time. They offer malleability and energy efficiency, which has inspired major cloud providers to integrate FPGAs in their data centres. However, one of the main bottlenecks is the time required to compile an industrial-scale design for an FPGA.

Researchers have attempted to accelerate FPGA compilation through parallelism, but an optimum solution is yet to be achieved. This project addresses these challenges by exploring new and effective strategies for software and hardware acceleration of the main computational bottlenecks: FPGA placement and routing.

In a research related to this project, we propose porting FPGA routing to a CPU+FPGA platform. Motivated by the approaches used in FPGA-accelerated graph processing, we propose and implement three acceleration strategies: (1) reducing the number of expensive random memory accesses, (2) parallel and pipelined computation, and (3) efficient hardware priority queues.

In another related work, we present a deterministic and parallel implementation of the VPR routability-driven router for FPGAs, where we consider two parallelization strategies: (1) routing multiple nets in parallel and (2) routing one net at a time, while parallelizing the Maze Expansion step.

Although FPGAs have several advantages over ASICs, higher power consumption is a major bottleneck. To address this problem, we propose a machine learning technique to design power gating regions in the FPGA routing network. Experiments show that our proposed clustering algorithms outperform the state of the art.

Related publications

  1. Gessler, P. Brisk, and M. Stojilović, A Shared-Memory Parallel Implementation of the RePlAce Global Cell PlacerThe 33rd International Conference on VLSI Design and 19th International Conference on Embedded Systems (VLSID), Bangalore, India, January 4 – 8, 2020.
    [detailed record]
  2. Zeinab, H. Asadi, and M. Stojilović, A Machine Learning Approach for Power Gating the FPGA Routing Network2019 International Conference on Field-Programmable Technology (ICFPT), Tianjin, China, December 9 – 13, 2019.
    [detailed record]
  3. Korolija and M. Stojilović, FPGA-Assisted Deterministic Routing for FPGAs2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brasil, May 20 – 24, 2019.
    [detailed record]
  4. Moctar, M. Stojilović, and P. Brisk, Deterministic Parallel Routing for FPGAs based on Galois Parallel Execution ModelThe 28th International Conference on Field Programmable Logic and Applications (FPL), Dublin, IRELAND, August 26 – 31, 2018.
    [detailed record]
  5. Stojilović, Parallel FPGA routing: Survey and challengesThe 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, September 4 – 8, 2017.
    [detailed record]