Scalable hash ripple join on spark

HKUST Electronic Theses

by Hao Liu

THESIS 2014

M.Phil. Fok Ying Tung Graduate School Innovative Technologies Leadership

ix, 52 pages : illustrations ; 30 cm

Abstract

Hash Ripple join is an online aggregation algorithm that can rapidly give good approximate join results increases with the progress of the join operation and converges to the real result when the join finishes. Luo et al. proposed a parallel hash ripple join (PHRJ) that runs in a distributed setting. However, the PHRJ has two draw backs when handling large-scale data: 1) PHRJ updates approximate results in fine grain which induces extra communication cost in a distributed environment 2) When data is out of memory, PHRJ cannot provide unbiased approximate result.

In this thesis, a scalable hash ripple join is proposed that 1) runs on a distributed framework that can process distributed data in coarse-grain to speed up the join performance; 2) continuously gives unbiased and consis...[ Read more ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Fok Ying Tung Graduate School Innovative Technologies Leadership Authors Liu, Hao Subjects Database management Information storage and retrieval systems Data processing Language English Call number Thesis FITL 2014 LiuH DOI 10.14711/thesis-b1333613

Full record

Scalable hash ripple join on spark

by Hao Liu

Post a Comment Cancel reply