THESIS
2017
xiii, 97 pages : illustrations ; 30 cm
Abstract
In astronomy, cross-match is a central operation to integrate multi-wavelength information by
identifying celestial objects across multiple catalogs. With the rapid increase in data volume
from space and ground-based surveys, it becomes mandatory to process large astronomic
catalogs efficiently. In this thesis, we study how to accelerate the cross-match of billion-record
catalogs on a cluster of heterogeneous computers with both CPUs and GPUs.
Specifically, we present two cross-match algorithms, namely IB-CM (Index-Based Cross-Match) and MASJ-CM (Multi-Assignment Single-Join Cross-Match), and study the performance
impact of indexing methods as well as design choices and optimizations of both algorithms
for a heterogeneous computer cluster. We have implemented these algorithms ful...[
Read more ]
In astronomy, cross-match is a central operation to integrate multi-wavelength information by
identifying celestial objects across multiple catalogs. With the rapid increase in data volume
from space and ground-based surveys, it becomes mandatory to process large astronomic
catalogs efficiently. In this thesis, we study how to accelerate the cross-match of billion-record
catalogs on a cluster of heterogeneous computers with both CPUs and GPUs.
Specifically, we present two cross-match algorithms, namely IB-CM (Index-Based Cross-Match) and MASJ-CM (Multi-Assignment Single-Join Cross-Match), and study the performance
impact of indexing methods as well as design choices and optimizations of both algorithms
for a heterogeneous computer cluster. We have implemented these algorithms fully
utilizing the computation and communication resources of the cluster, and compared with those on Spark and SpatialHadoop, two popular distributed computing platforms. Our evaluations
on real-world astronomic catalogs show that our native implementations were orders of
magnitude faster than those on Spark or SpatialHadoop and that self-matching billion-record
catalogs on a six-node cluster finished under five minutes.
Post a Comment