Storage optimization for large wide tables in Hadoop

HKUST Electronic Theses

Storage optimization for large wide tables in Hadoop

by Wei Li

THESIS 2015

M.Phil. Computer Science and Engineering

x, 43 pages : illustrations ; 30 cm

Abstract

Recent advances in data warehousing technologies are enabling the storage and processing of extremely large data sets. In viewing this opportunity, the leading cross-bank settlement institute in China is looking for more business intelligence in their large-volume historical transaction data accumulated in more than 10 years. Though a mature data warehousing solution Hive (an open-source data warehousing solution built on top of Hadoop) is being adopted in production, the efficiency of data storage and processing is suboptimal due to the lack of advanced customization and optimization on the system. Specifically, overlapping fractions of the original data set are materialized to different tables, introducing inter-table redundancy. Additions and changes on columns inside a table...[ Read more ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Computer Science and Engineering Authors Li, Wei Subjects Apache Hadoop Big data Data warehousing Electronic data processing Distributed processing Language English Call number Thesis CSED 2015 LiW DOI 10.14711/thesis-b1487570

Full record

Storage optimization for large wide tables in Hadoop

by Wei Li

Post a Comment Cancel reply