THESIS
2012
50 p. : ill. ; 30 cm
Abstract
A modern sequencing instrument is able to generate hundreds of millions of short reads of genomic data on a daily basis. As a result, there is an urgent need to develop fast algorithms that can efficiently handle, store, compress, access, and decompress genomic data. This thesis focuses on specialized compression schemes that can quickly compress and decompress large scale genomic data. We developed light-weight compression schemes for the FASTQ/FASTA format data, as well as specifically for sequence alignment output data. Furthermore, we leverage the Graphics Processing Unit’s (GPU) massively parallel architecture, high density of arithmetic logic units, and superior memory bandwidth to significantly accelerate compression and decompression. We demonstrate that our GPU-powered custom c...[
Read more ]
A modern sequencing instrument is able to generate hundreds of millions of short reads of genomic data on a daily basis. As a result, there is an urgent need to develop fast algorithms that can efficiently handle, store, compress, access, and decompress genomic data. This thesis focuses on specialized compression schemes that can quickly compress and decompress large scale genomic data. We developed light-weight compression schemes for the FASTQ/FASTA format data, as well as specifically for sequence alignment output data. Furthermore, we leverage the Graphics Processing Unit’s (GPU) massively parallel architecture, high density of arithmetic logic units, and superior memory bandwidth to significantly accelerate compression and decompression. We demonstrate that our GPU-powered custom compression schemes achieve a compression ratio similar to or better than general purpose compressing algorithms for sequence data, also gain 20 times faster in compression process. Finally, we integrate our compression techniques into the state-of-the-art alignment tools and accelerate the overall speed by an order of magnitude by reducing the IO cost.
Post a Comment