THESIS
2014
xii, 61 pages : illustrations ; 30 cm
Abstract
In genome wide association studies (GWAS), there are single-nucleotide polymorphism
(SNP) pairs which have significant associations with diseases via the combination of their
main effects and interactions. This effect is referred to as associations allowing for interactions [1]. A fast method has been proposed [2]. The method is based on a likelihood
ratio test with the assumption that the statistics follow a chi-square distribution. Many
SNP pairs with significant associations allowing for interactions have been detected using
their method. However, the chi-square test requires the expected values in each cell of
the contingency table to be at least 5. This assumption is violated in some identified SNP
pairs. In this case, a likelihood ratio test may not be applicable any more....[
Read more ]
In genome wide association studies (GWAS), there are single-nucleotide polymorphism
(SNP) pairs which have significant associations with diseases via the combination of their
main effects and interactions. This effect is referred to as associations allowing for interactions [1]. A fast method has been proposed [2]. The method is based on a likelihood
ratio test with the assumption that the statistics follow a chi-square distribution. Many
SNP pairs with significant associations allowing for interactions have been detected using
their method. However, the chi-square test requires the expected values in each cell of
the contingency table to be at least 5. This assumption is violated in some identified SNP
pairs. In this case, a likelihood ratio test may not be applicable any more. A permutation
test is an ideal approach to double checking the p-values calculated in a likelihood ratio
test because of its nonparametric nature. The p-values of SNP pairs having significant
associations with disease are always extremely small, so permutation test in genome wide
association studies is computationally demanding. We need a huge number of permutations to achieve a correspondingly high resolution for the p-value. In order to investigate
whether the p-values from likelihood ratio tests are reliable, a fast permutation tool to
accomplish a large number of permutations is desirable.
In this thesis, we firstly presented a fast permutation tool based on graphics processing
units (GPUs) with highly reliable p-value estimation. We designed a memory
layout schema which is dedicated to concurrent permutation, and utilized the properties
of different memories in GPUs to optimize the efficiency of the tool. We also proposed
an algorithm to test multiple SNP pairs in each iteration of permutation, which greatly
improved the efficiency of testing the identified SNP pairs. Our tool completed 10
7 permutations for a single SNP pair from the Wellcome Trust Case Control Consortium
(WTCCC, [3]) genome data within 1 minute on a single Nvidia Tesla M2090 device, while
it took 60 minutes on a single CPU Intel Xeon ES-2650 to finish the same task. More
importantly, when simultaneously testing 256 SNP pairs for 10
7 permutations, our tool
took only 5 minutes, while the CPU program took 10 hours.
Secondly, we used this tool to do permutation tests on simulation data sets to determine
the eligibility condition of likelihood ratio tests. We found that the p-values from
likelihood ratio tests will have relative error of more than 100% when more than 8 cells
in the contingency table have an expected count of less than 5 or when there is a zero
expected count in any of the contingency table cells.
Finally, we permuted the WTCCC data sets. By permuting on a GPU cluster consisting
of 40 nodes, we completed 10
12 permutations for all 280 SNP pairs reported with
p-values smaller than 10
-12 in the WTCCC data sets in 1 week. We found two pairs
whose permutation test p-values were larger than the significance threshold. Both these
pairs have cells in their contingency tables with a zero expected count. Meanwhile, we
verified that there is no significant error in the likelihood ratio test p-values of the other SNP pairs under the resolution of 10
-12. This result is consistent with the experiments on simulation data sets. When the eligibility condition of a likelihood ratio test is violated, we should use a permutation test instead.
Post a Comment