THESIS
2012
xv, 117 p. : ill. ; 30 cm
Abstract
Genome sequence analysis is central to today’s genomics research, and sequence alignment and Single-Nucleotide Polymorphism (SNP) detection are two fundamental tasks in the analysis. Sequence alignment, in particular, short read alignment, matches DNA fragments generated from second-generation sequencers to a reference sequence. Subsequently, through SNP detection, the variation on a single nucleotide is identified between each aligned read and the reference sequence. As these analysis tasks handle millions to billions of base pairs of gene data and perform intensive computations, we accelerate the analysis system by (1) improving the I/O, memory access and computation of each task; and (2) tightly integrating the two tasks to reduce redundancy....[
Read more ]
Genome sequence analysis is central to today’s genomics research, and sequence alignment and Single-Nucleotide Polymorphism (SNP) detection are two fundamental tasks in the analysis. Sequence alignment, in particular, short read alignment, matches DNA fragments generated from second-generation sequencers to a reference sequence. Subsequently, through SNP detection, the variation on a single nucleotide is identified between each aligned read and the reference sequence. As these analysis tasks handle millions to billions of base pairs of gene data and perform intensive computations, we accelerate the analysis system by (1) improving the I/O, memory access and computation of each task; and (2) tightly integrating the two tasks to reduce redundancy.
We have explored the use of a graphics processor, or a GPU, as a hardware accelerator to speed up the in-memory computation for both tasks. As a result, we propose a filtering-verification algorithm for sequence alignment that efficiently utilizes a GPU’s hardware resources. In particular, we have designed a sparse data representation format to improve memory access in a GPU in SNP detection. Finally, we adopt a partition-based alignment storage layout and customized data compression techniques to reduce the I/O cost and improve the overall speed of the system. Experiment results show that our system can accelerate genome sequence analysis by an order of magnitude over state-of-the-art CPU-based analysis.
Post a Comment