MRTune : performance tuner for MapReduce applications

HKUST Electronic Theses

MRTune : performance tuner for MapReduce applications

by Xibo Zhou

THESIS 2013

M.Phil. Fok Ying Tung Graduate School Computer Science and Technology

xii, 68 pages : illustrations ; 30 cm

Abstract

MapReduce is a processing paradigm invented by Google that has been widely used for both high performance computing and big data processing. Though the programming model is simple, it is very challenging to conduct performance tuning for a MapReduce job. The main reason is that a job usually has many configuration parameters that should be tuned carefully according to both the data distribution and the execution environment (e.g., cluster size and failure rate). It is desirable if we can efficiently estimate the execution time of a job without actually execute it, given the input data and the settings. For this purpose, we can use either a theoretical model or a simulator. However, due to the complex internal execution flow and various types of failure that are hard to model, it is extremely difficult to build theoretical models to analyze the performance of MapReduce jobs. Moreover, to the best of our knowledge, all the existing MapReduce simulators cannot well simulate the system behaviors in the presence of unpredictable failures and skewed input data.

In this thesis, we propose a new MapReduce performance tuner called MRTune. Compared with other approaches on runtime estimation, MRTune takes system failures and data skew into serious consideration, which makes it very useful for parameter tuning for real MapReduce jobs. It can be also used to study the influence of skewed data in MapReduce applications. The model of MRTune simulates the basic components within a MapReduce application based on discrete event simulation in order to guarantee the efficiency, and the input specifies of MRTune imports the information representing data distribution. We validated MRTune with input under even distribution and Zipfian distribution, implementing Word Count and Inverted Indexing to test the accuracy with simple benchmarks and more complicated jobs. We also used MRTune for case study of data skew and failure. In previous research on skew handling, most of the problem statement of partitioning skew is under the situation where the number of keys and the number of reducers are close or in the same order of magnitude. We observed that the partition skew will be greatly mitigated while the number of keys is much larger than the number of reducers, which is often the case in real applications. Moreover, in most cases users will set the number of reduce tasks equal to the number of reduce slots of the cluster. However, if the number of reduce slot is too small, the influence of failure will be huge. We observed that in this case, the influence of failure will decrease while the number of reduce tasks is raised.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Fok Ying Tung Graduate School Computer Science and Technology Authors Zhou, Xibo Subjects MapReduce (Computer program) Electronic data processing Database management Software Language English Call number Thesis FCST 2013 Zhou DOI 10.14711/thesis-b1254455

Full record

MRTune : performance tuner for MapReduce applications

by Xibo Zhou

Post a Comment Cancel reply