Computing summaries over distributed data

HKUST Electronic Theses

Computing summaries over distributed data

by Zengfeng Huang

THESIS 2013

Ph.D. Computer Science and Engineering

ix, 97 pages : illustrations ; 30 cm

Abstract

Consider a distributed system with k nodes, where each node holds a part of the data. The goal is to extract some useful information from the entire data set or to compute some functions over the data. We are interested in designing communication-efficient algorithms and also characterizing the communication complexity for various problems. We consider both a flat network structure and more complicated tree networks.

In this thesis, we study some most important statistical summaries of the underlying data, in particular item frequencies, heavy hitters, quantiles, and ε-approximations, which are extensively studied in database, machine learning, computational geometry and data mining. We provide general techniques for both designing efficient algorithms and proving communica...[ Read more ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Yi, Ke Authors Huang, Zengfeng Subjects Electronic data processing Distributed processing Mathematical models Big data Data processing Language English Call number Thesis CSED 2013 Huang DOI 10.14711/thesis-b1250313

Full record

Computing summaries over distributed data

by Zengfeng Huang

Post a Comment Cancel reply