THESIS
2022
1 online resource (x, 33 pages) : illustrations (some color)
Abstract
Similarity search is a basic operation in database systems and widely used in industrial
applications to handle complex data like images and user information, which are commonly
represented by numerical feature vectors. This thesis aims to study how to better
utilize GPUs for this task. We decompose similarity search into two phases, distance
calculation and k-selection, and analyze their bottlenecks and solutions on GPUs respectively.
For each phase, we explore several mainstream solutions and re-implement most
of them with efficient codes. Additionally, we propose and implement several new optimizations,
including SMML-S and SMML-L, two matrix multiplication kernel designs,
and BucketSelect-Opt, a k-selection method to accelerate similarity search on GPUs. We
conduct extensive experim...[
Read more ]
Similarity search is a basic operation in database systems and widely used in industrial
applications to handle complex data like images and user information, which are commonly
represented by numerical feature vectors. This thesis aims to study how to better
utilize GPUs for this task. We decompose similarity search into two phases, distance
calculation and k-selection, and analyze their bottlenecks and solutions on GPUs respectively.
For each phase, we explore several mainstream solutions and re-implement most
of them with efficient codes. Additionally, we propose and implement several new optimizations,
including SMML-S and SMML-L, two matrix multiplication kernel designs,
and BucketSelect-Opt, a k-selection method to accelerate similarity search on GPUs. We
conduct extensive experiments in different settings to investigate the performance of existing
and our proposed methods. The results show that our proposed methods perform
satisfactorily in their target domains. Furthermore, based on these experimental results,
we provide guidelines on how to choose the right strategies for a given situation in each
phase.
Post a Comment