Fast and accurate statistical simulation of shared-memory applications on multicore systems

HKUST Electronic Theses

Fast and accurate statistical simulation of shared-memory applications on multicore systems

by Rafael Kioji Vivas Maeda

THESIS 2019

Ph.D. Electronic and Computer Engineering

xiv, 111 pages : illustrations ; 30 cm

Abstract

The default method to study application-architecture interactions is cycle-accurate simulation. Statistical simulation is an alternative method that approaches these interactions from a different angle than time. It has been demonstrated that statistical simulation offers new possibilities to substantially speed up the simulation. The common way to build statistical simulation is using the reuse distance (RD) memory locality model. Unfortunately, the RD model can capture only a single locality granularity, such as the cache-line locality. This limitation leads to a considerably high error when evaluating multi-level caches. In addition, RD alone is only suitable to model single-core applications. Therefore, existing statistical simulators lack effective memory locality models for multiprocessor applications and often neglect data-sharing between threads. Moreover, the typical method to speed up statistical simulations is to blindly reduce the trace length to be synthesized. While this gives good control over the speedup, it leaves the simulation error unbounded. In this thesis, we address these issues. We first introduce a generalization to the RD that can capture the locality seen at multiple granularities. We refer to it as hierarchical reuse distance (HRD). Our results show that HRD is 4X more accurate than RD when simulating single-core systems with multi-level caches. HRD also converges three orders of magnitude faster than RD. The second contribution is a novel s̲h̲a̲ring-l̲o̲cality m̲odel (Shalom). Shalom can capture and reproduce data-sharing in multithread applications. Lastly, the third contribution is a method to bound the statistical simulation error for a particular metric while maximizing the speedup. We achieve this by monitoring the convergence of the statistical synthesis. We name it c̲o̲n̲vergence-d̲e̲termiṉistic s̱imulation (Condens). In a set of experiments, the combination of Shalom and Condens is on average 234X faster than cycle-accurate simulations, with simulation error of 15.4%. Our approach is also 48X faster than state-of-the-art sampling simulation under the same accuracy level. Compared to statistical simulators ignoring sharing, our technique is 3x more accurate for performance metrics and 5x more accurate for cache miss estimations.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Electronic and Computer Engineering Supervisors Xu, Jiang Authors Vivas Maeda, Rafael Kioji Subjects Computer architecture Evaluation Statistical methods Language English Call number Thesis ECED 2019 VivasM DOI 10.14711/thesis-991012762568003412

Full record

Fast and accurate statistical simulation of shared-memory applications on multicore systems

by Rafael Kioji Vivas Maeda

Post a Comment Cancel reply