THESIS
2019
x, 62 pages : illustrations (some color) ; 30 cm
Abstract
One of bottlenecks of Field-Programmable Gate Array (FPGA) based acceleration is the
hierarchy and management scheme of FPGA memory system, consisting of main memory,
caches and Block-RAMs.
For performing design space exploration leading to architecture and design optimization,
system level simulation showing the interaction among CPUs, FPGA accelerators
and memory system precisely is important. We develop PAAS (Processor Accelerator Architecture
Simulator), a system level simulator to enable cycle-accurate full system simulation
of CPU-accelerator heterogeneous systems. PAAS can easily support flexible architectural
configurations, such as different on-chip interconnection topologies, memory
hierarchy and etc. As an example showing the research capability of PAAS, we first eva...[
Read more ]
One of bottlenecks of Field-Programmable Gate Array (FPGA) based acceleration is the
hierarchy and management scheme of FPGA memory system, consisting of main memory,
caches and Block-RAMs.
For performing design space exploration leading to architecture and design optimization,
system level simulation showing the interaction among CPUs, FPGA accelerators
and memory system precisely is important. We develop PAAS (Processor Accelerator Architecture
Simulator), a system level simulator to enable cycle-accurate full system simulation
of CPU-accelerator heterogeneous systems. PAAS can easily support flexible architectural
configurations, such as different on-chip interconnection topologies, memory
hierarchy and etc. As an example showing the research capability of PAAS, we first evaluate
the impact of various memory hierarchies for CPU-FPGA system. Furthermore, we
propose and investigate a cache-partitioning scheme for improving the performance of
shared-cache based CPU-FPGA systems.
Apart from designing proper memory hierarchy and cache management policy, making
full use of limited Block-RAM resource on FPGA is also critical for acceleration. High
Level Synthesis (HLS) tools have been proposed in order to simplify the FPGA-based
design process but they do not consider dynamic memory allocation constructs in high-level
programming languages like C and limit themselves to static memory allocation.
We propose a dynamic memory allocation and management scheme, called Hi-DMM, for
inclusion in commercial HLS design flows, like Xilinx VivadoHLS. Hi-DMM performs
automatic source-to-source transformation of user C code into C-source code with the
dynamic memory allocator and optimized management scheme. Relying on buddy tree-based
allocation schemes and efficient hardware implementation of the allocators, Hi-DMM achieves 4x speed-up memory allocation of different granularities compared to
previous works. Experimental results show that dynamic memory allocation of FPGA
memory resources can be achieved at a much lower latency with minimal resource overhead,
paving the way for synthesis of dynamic memory constructs in commercial HLS
flows.
Post a Comment