THESIS
2022
1 online resource (x, 40 pages) : illustrations (chiefly color)
Abstract
As the computational complexity of the algorithm increases, the traditional CPU and GPU
architecture cannot meet the performance and energy requirements due to the frequent
data movement between processors and off-chip memory. The application-specific accelerator
architectures have been popular in both data centers and edge devices because of
their high energy efficiency and parallelism. Especially the systolic array is widely used as
the matrix multiplication accelerator in machine learning and mathematical applications
because matrix operations are the bottleneck of many algorithms for hardware execution.
Thereby, the systolic array, as the matrix accelerator, usually has the highest parallelism
and occupies much more area in the chip compared to other components, which will
cause the...[
Read more ]
As the computational complexity of the algorithm increases, the traditional CPU and GPU
architecture cannot meet the performance and energy requirements due to the frequent
data movement between processors and off-chip memory. The application-specific accelerator
architectures have been popular in both data centers and edge devices because of
their high energy efficiency and parallelism. Especially the systolic array is widely used as
the matrix multiplication accelerator in machine learning and mathematical applications
because matrix operations are the bottleneck of many algorithms for hardware execution.
Thereby, the systolic array, as the matrix accelerator, usually has the highest parallelism
and occupies much more area in the chip compared to other components, which will
cause the dramatic fluctuation of system power consumption when the systolic array accelerator
is working. Due to the non-ideal power delivery network (PDN), the dramatic
fluctuation of power consumption will introduce large voltage noise and degrade the system
reliability. Hence the voltage noise is a critical issue for PDN and system architecture
designers. Typically, a large voltage guardband is allocated when designing the power
delivery network to tolerate real-world worst-case voltage variation. However, the large
voltage guardband will reduce the energy efficiency. Exploring how to reduce the voltage
guardband is a good opportunity to improve the energy efficiency for the systolic array-based
accelerator. Different from CPU and GPU architecture, the power fluctuation of a systolic array-based accelerator strongly depends on its handling data. Hence, the main
challenge of the systolic array’s energy efficiency improvement is accurately detecting and
mitigating the dramatic power peaks with little energy and performance overhead.
To accurately detect and mitigate power peaks that cause large voltage drops, we propose
a data signal-based power peak prediction and mitigation method for the systolic
array-based DNN accelerator. Our proposed method can predict power variation based
on zero and non-zero distributions of the input data temporally and spatially when data
are written into memory. Based on the power prediction model, the power peak detector
can consider both its past and future power variation when judging whether this power
peak should be mitigated to minimize performance loss compared to the estimated-based
detector. Moreover, for power peak mitigation, the controller will partition a large power
rise edge into two small power rise edges by inserting some bubble operations every period
to reduce the large voltage drop.
In our experiment, we implement a 4 × 16 × 4 output stationary systolic array to evaluate
resource overhead and simulate its power consumption to evaluate voltage noise in
PDN equivalent circuit. We apply the feature and weight data of 10 layers in ssd_vgg,
Resnet, and BERT networks as simulation data. The results show that our proposed mitigation
method can reduce about 12% worst-case voltage drops and reduce 28mV voltage
guardband. Our lightweight voltage noise mitigation method only introduces about 1%
resource overhead and less than 3% performance loss averagely. At the same time, we analyze
the performance overhead of prediction-based and estimation-based detectors. The
estimation-based detector will introduce above 20% execution time in some layers. However,
the maximum delay overhead in our experiment is 7.5% for the prediction-based
method.
Post a Comment