THESIS
2017
xiv, 100 pages : illustrations ; 30 cm
Abstract
Alongside the developing trend of numerous and various automated software/hardware applications
taking over all aspects of our lives, the underlying computation platforms have further
exhibited their incompetence of being omnipotent facing versatile computation tasks. The philosophy
behind heterogeneous computing is to combine various computation resources with
different strength to collaboratively surpass traditional homogeneous computing systems with
respect to performance, power, cost, etc. Field Programmable Gate Array (FPGA) based heterogeneous
systems, for example, provide excellent performance and power efficiency that inherited
with FPGAs, while maintaining high flexibility and scalability.
In typical CPU-FPGA heterogeneous systems, it is common practice that CPU offload...[
Read more ]
Alongside the developing trend of numerous and various automated software/hardware applications
taking over all aspects of our lives, the underlying computation platforms have further
exhibited their incompetence of being omnipotent facing versatile computation tasks. The philosophy
behind heterogeneous computing is to combine various computation resources with
different strength to collaboratively surpass traditional homogeneous computing systems with
respect to performance, power, cost, etc. Field Programmable Gate Array (FPGA) based heterogeneous
systems, for example, provide excellent performance and power efficiency that inherited
with FPGAs, while maintaining high flexibility and scalability.
In typical CPU-FPGA heterogeneous systems, it is common practice that CPU offloads
computation-intensive tasks to the FPGA accelerator. CPU simultaneous multi-tasking often
requires multiple tasks to be accelerated in parallel. Compared with traditional single-context
FPGAs, multi-context FPGAs store multiple configurations on-chip, achieving faster
context switching between different tasks. We propose a multi-context accelerated heterogeneous
system, that allows simultaneous multi-task acceleration. Static and dynamic placement
and scheduling algorithms for hardware tasks are invented, which are the first solutions for
multi-context FPGAs, to improve FPGA resource utilization and runtime task performance.
On top of a commercial FPGA-based heterogeneous system, we investigated Deep Neural
Network (DNN) application and proposed an end-to-end mapping flow, FP-DNN, that takes in
a Tensorflow python-described DNN and generates hardware implementation on CPU-FPGA
heterogeneous system automatically. DNN applications are both computation-intensive and
memory-intensive compared to shallow neural networks and challenging to be efficiently deployed
automatically. Our mapping flow tackles performance, power, and flexibility simultaneously
and is the first to cover sophisticated DNNs like ResNet and Inception with state-of-the-art
performance.
Simulation platforms for FPGA-accelerated heterogeneous computing systems are also investigated,
including FPGA power estimation and heterogeneous architectural simulation. The
simulation platforms are validated either with vendor tools or real measurements, to facilitate
explorations and validations of researchers in the related fields.
Post a Comment