THESIS
2023
1 online resource (xi, 43 pages) : illustrations (some color)
Abstract
Combinatorial optimizations are significantly important to a variety of engineering, scientific,
and marketing problems, such as VLSI circuit layout design, machine learning, logistics,
and financial markets. However, most of the combinatorial problems are extremely
challenging for the complexity of those algorithms tends to be NP-hardness, which means
that we cannot find an accurate solution within polynomial computation time. In order
to effectively solve these problems, many heuristic methods have been proposed like simulated
annealing, quantum annealing, quantum bifurcation, and simulated bifurcation.
Because researchers have identified that many combinatorial problems can be mapped
to the so-called Ising problem which is established from thermodynamics and statistical
physics, vari...[
Read more ]
Combinatorial optimizations are significantly important to a variety of engineering, scientific,
and marketing problems, such as VLSI circuit layout design, machine learning, logistics,
and financial markets. However, most of the combinatorial problems are extremely
challenging for the complexity of those algorithms tends to be NP-hardness, which means
that we cannot find an accurate solution within polynomial computation time. In order
to effectively solve these problems, many heuristic methods have been proposed like simulated
annealing, quantum annealing, quantum bifurcation, and simulated bifurcation.
Because researchers have identified that many combinatorial problems can be mapped
to the so-called Ising problem which is established from thermodynamics and statistical
physics, various Ising machines have sprung up, and one of the cutting-edge Ising
machines is called a coherent Ising machine(CIM), which is based on the optical bifurcation
phenomenon. Since the requirements for quantum bifurcation machines and CIM
are much more complicated, for example, we need refrigerators and lasers for quantum
machines and CIM, a recently proposed simulated bifurcation machine (SBM) aroused
much attention for its high degree of parallelism and good performance to find the approximate
solution to most scenarios which need to find the optimal solution instantaneously.
Considering the practicality of deployment and the high level of configurability
and parallelism of FPGA, therefore, we are determined to realize a FPGA-based simulated bifurcation machine(SBM) that supports a large sparse dataset.
For a large sparse dataset, to begin with, the size of all the data usually amounts to millions
of floating point numbers. Since the representation of a floating-point number, be it
a float or double, at least needs 4 bytes to represent, we particularly quantized the dataset
to only 1 byte without losing the accuracy of the SBM algorithm. From our experiments, if
the initial quantized position/momentum value and some parameters are appropriately
tuned, we can even get a better result compared with the floating-point number. It turns
out that the SB algorithm is sensitive to the initial conditions and quantization is also fit
for this algorithm. In addition, considering the sparsity of the dataset, in which there
are many zeros padding within the matrix. People often compressed the dataset using
compressed sparse row(CSR) or compressed sparse column(CSC) format. However, considering
the convenience of hardware design targeting for ASIC or FPGA on which we describe
our circuits using a kind of hardware description language(HDL), such as Verilog,
VHDL, or System Verilog, we use the coordinate format(COO) to compress our dataset.
By using this kind of representation, we can easily address the sparse matrix representation
without any additional computation module for the conversion from CSR to index
our effective values. The reason for that, we will discuss it in the introduction part. Regarding
the circuit design for SBM, since the major and most time-consuming operation in
this algorithm is sparse matrix dense vector multiplication(SPMV), we draw the inspiration
from the GAS computation model for graph processing and encode and compress the
sparse matrix values as edges and for the numbers in the vector, we quantized the floating
point values to Int8 numbers as vertexes. First, we dispatch the edges and vertexes data
to our PEs(Processing Elements) through AXI4 protocal, within our PEs, then those PEs
will fetch the respective edges and vertexes to do the computation in parallel and after
we get the SPMV results, we can do the time evolution(TE) part for the SBM algorithm.
During the time evolution period, we use the ping-pong buffer to make the computation
in a pipelined fashion such that we can continuously dispatch our vertexes and edges to
make the PEs run without idle states. The concrete circuits will be described in more detail
in a later chapter. In our experiments, by means of our quantization techniques, we can
have a 10x speed up without any accuracy loss compared with our double values using
the PyTorch framework and we also implement our circuit design using ZCU102 in the
condition of 128 bits per cycle from DDR4 which is under the control of PS(Processing System) part.
Post a Comment