THESIS
2013; 2013
xiv, 129 p. : ill. ; 30 cm
Abstract
Flash solid state drives (SSDs), or flash disks, are a type of persistent storage devices
with the potential to replace magnetic disks. They outperform magnetic disks on access
speed, bandwidth, shock resistance, and power efficiency. As their capacity increases
and prices decrease, flash disks are considered for the storage of database systems.
Due to the differences in flash SSDs and magnetic disks, traditional data management
techniques designed for magnetic disks need to be re-examined for flash disks. In
particular, the flash memory used in flash disks has an asymmetry between read and
write speeds, where reads, no matter random or sequential, are much faster than writes.
This thesis studies the performance of transactional workloads on flash disks and designs efficient s...[
Read more ]
Flash solid state drives (SSDs), or flash disks, are a type of persistent storage devices
with the potential to replace magnetic disks. They outperform magnetic disks on access
speed, bandwidth, shock resistance, and power efficiency. As their capacity increases
and prices decrease, flash disks are considered for the storage of database systems.
Due to the differences in flash SSDs and magnetic disks, traditional data management
techniques designed for magnetic disks need to be re-examined for flash disks. In
particular, the flash memory used in flash disks has an asymmetry between read and
write speeds, where reads, no matter random or sequential, are much faster than writes.
This thesis studies the performance of transactional workloads on flash disks and designs efficient storage schemes for them. Specifically, we begin with the performance
study of the TPC-C workload on flash SSDs. Overall, the flash SSDs outperform
the magnetic disk by up to an order of magnitude. Moreover, the I/O performance
of the SSDs is dominated by random writes, whereas that of the magnetic
disk by random reads. Additionally, both minimising logging and adopting MVCC
(Multi-Version Concurrency Control) than 2PL (Two-Phase Locking) helps improve
the performance on flash SSDs.
Observing the dominance of random writes in flash SSDs under TPC-C workloads,
we propose a new database storage layout, called Partitioned Logging (PTL). In PTL,
we replace data writes with logging to eliminate random page writes, and put data
and logs into separate blocks. Moreover, we group data blocks into partitions so that
updates on each partition are appended as log entries to one log block. This way, we
can tune the partition size to balance the read and write performance based on the
hardware and workload characteristics. The results show a considerable improvement
over both the traditional storage and a leading flash-based database storage scheme.
Finally, to solve the redundant I/O problem and eliminate merge operations that
are essential in all the other log-structured approaches, we propose FlashTKV, which
adopts a purely sequential storage format where all the data and transactional information
are log records. Furthermore, we support MVCC on this sequential storage
efficiently. Our results show that FlashTKV improves the transaction throughput by
70% over two well-known KV-stores under TPC-C workloads on flash SSDs.
Post a Comment