THESIS
2023
1 online resource (ix, 55 pages) : color illustrations
Abstract
It is well known that sorting N elements in the external memory model requires at least sort(N) = 2N/B ⌈log(N/B)/log(M/B)⌉ I/Os, where M is the internal memory size and B is the block size, and this is attained precisely by external merge sort.
In recent years, due to growing concerns on privacy breaches on outsourced data, there is much interest in making algorithms data-oblivious, i.e., the access pattern to external memory should be independent of the input data. Unfortunately, merge sort is fundamentally difficult to be made oblivious, so prior work by Goodrich [24] chose to make distribution sort oblivious, but it incurs a large constant factor slowdown, with
an I/O cost at least 40 · sort(N). A recent algorithm called bucket sort [4, 39] took a
different approach, reducing it to...[
Read more ]
It is well known that sorting N elements in the external memory model requires at least sort(N) = 2N/B ⌈log(N/B)/log(M/B)⌉ I/Os, where M is the internal memory size and B is the block size, and this is attained precisely by external merge sort.
In recent years, due to growing concerns on privacy breaches on outsourced data, there is much interest in making algorithms data-oblivious, i.e., the access pattern to external memory should be independent of the input data. Unfortunately, merge sort is fundamentally difficult to be made oblivious, so prior work by Goodrich [24] chose to make distribution sort oblivious, but it incurs a large constant factor slowdown, with
an I/O cost at least 40 · sort(N). A recent algorithm called bucket sort [4, 39] took a
different approach, reducing it to 3 · sort(N). In this paper, we show that distribution
sort can be made oblivious while attaining an I/O cost of (1 + o(1)) · sort(N), provided that M/B is greater than all polylogarithms of N, thus closing the gap between oblivious and non-oblivious external sort to within lower order terms. The algorithm is also simple and practical; our experimental results show that its I/O cost is indeed very close to the non-oblivious lower bound, especially when M is sufficiently large.
Post a Comment