THESIS
2019
xiii, 118 pages : illustrations ; 30 cm
Abstract
Heterogeneous computing is a promising direction to address the challenges of performance
and power walls in today’s high-performance computing. For this purpose, the CPU-FPGA system
is especially promising due to the high flexibility of FPGA, which enables customization
for various computing tasks to boost performance and energy efficiency. Nowadays, tightly-coupled
CPU-FPGA systems with shared cache hierarchy (like Intel HARP and IBM POWER
with CAPI) have been proposed to enhance the communication efficiency between the CPU and
FPGA and simplify the programming model. In such systems, multi-core CPUs and the FPGA
coherently share the same cache system and an FPGA cache is attached to the FPGA for quick
memory access. Such emerging architectures bring new challenges when design...[
Read more ]
Heterogeneous computing is a promising direction to address the challenges of performance
and power walls in today’s high-performance computing. For this purpose, the CPU-FPGA system
is especially promising due to the high flexibility of FPGA, which enables customization
for various computing tasks to boost performance and energy efficiency. Nowadays, tightly-coupled
CPU-FPGA systems with shared cache hierarchy (like Intel HARP and IBM POWER
with CAPI) have been proposed to enhance the communication efficiency between the CPU and
FPGA and simplify the programming model. In such systems, multi-core CPUs and the FPGA
coherently share the same cache system and an FPGA cache is attached to the FPGA for quick
memory access. Such emerging architectures bring new challenges when designing the CPU-FPGA
collaborating systems. In this thesis, we address the challenges in emerging CPU-FPGA
systems from various perspectives. First, we develop a simulation framework for CPU-FPGA
systems to aid the design evaluation. It supports fast architectural exploration with respect to the
number of cores, number of accelerated units on the FPGA, and different cache hierarchies between
the CPU and FPGA. Various performance metrics are returned for the performance analysis
and architectural configuration optimization. Then, motivated by the fact that the behavior
of the FPGA cache often dominates the performance in emerging shared cache CPU-FPGA
systems, we design two cache management approaches to enhance the FPGA cache utilization,
targeting two different scenarios. One is to rely on cache bypassing to improve the FPGA cache hit rate for a single accelerated unit, the other is to alleviate the cache contention among multiple
accelerated units by combing both cache partitioning and cache bypassing. These two
approaches rely on static analysis of the applications and dynamic control guided by such static
analysis. Finally, targeting the recently released Intel HARP2 CPU-FPGA system, where there
are three bus links between the CPU and FPGA, one QPI bus attached with an FPGA cache and
two PCIe buses, we develop an access management framework to select the bus link for each
access in a static and dynamic hybrid way. The framework adaptively arranges the memory
accesses to the preferred link to enhance the utilization of all links and boost the FPGA cache
reuse benefit. It also solves the data inconsistency problem caused by the multiple links. A
complete set of software services and hardware IPs for the framework is provided on the real
HARP2 system. In summary, the proposed thesis performs a deep and multi-dimensional study
on emerging CPU-FPGA systems, from architectural exploration to performance optimization,
from simulation environment development to real system design, from algorithm level to the
hardware logic level, and covers various types of emerging CPU-FPGA systems.
Post a Comment