Permutation tests for genome-wide association studies

HKUST Electronic Theses

Permutation tests for genome-wide association studies

by Guangyuan Yang

THESIS 2014

M.Phil. Electronic and Computer Engineering

xii, 61 pages : illustrations ; 30 cm

Abstract

In genome wide association studies (GWAS), there are single-nucleotide polymorphism (SNP) pairs which have significant associations with diseases via the combination of their main effects and interactions. This effect is referred to as associations allowing for interactions [1]. A fast method has been proposed [2]. The method is based on a likelihood ratio test with the assumption that the statistics follow a chi-square distribution. Many SNP pairs with significant associations allowing for interactions have been detected using their method. However, the chi-square test requires the expected values in each cell of the contingency table to be at least 5. This assumption is violated in some identified SNP pairs. In this case, a likelihood ratio test may not be applicable any more. A permutation test is an ideal approach to double checking the p-values calculated in a likelihood ratio test because of its nonparametric nature. The p-values of SNP pairs having significant associations with disease are always extremely small, so permutation test in genome wide association studies is computationally demanding. We need a huge number of permutations to achieve a correspondingly high resolution for the p-value. In order to investigate whether the p-values from likelihood ratio tests are reliable, a fast permutation tool to accomplish a large number of permutations is desirable.

In this thesis, we firstly presented a fast permutation tool based on graphics processing units (GPUs) with highly reliable p-value estimation. We designed a memory layout schema which is dedicated to concurrent permutation, and utilized the properties of different memories in GPUs to optimize the efficiency of the tool. We also proposed an algorithm to test multiple SNP pairs in each iteration of permutation, which greatly improved the efficiency of testing the identified SNP pairs. Our tool completed 10⁷ permutations for a single SNP pair from the Wellcome Trust Case Control Consortium (WTCCC, [3]) genome data within 1 minute on a single Nvidia Tesla M2090 device, while it took 60 minutes on a single CPU Intel Xeon ES-2650 to finish the same task. More importantly, when simultaneously testing 256 SNP pairs for 10⁷ permutations, our tool took only 5 minutes, while the CPU program took 10 hours.

Secondly, we used this tool to do permutation tests on simulation data sets to determine the eligibility condition of likelihood ratio tests. We found that the p-values from likelihood ratio tests will have relative error of more than 100% when more than 8 cells in the contingency table have an expected count of less than 5 or when there is a zero expected count in any of the contingency table cells.

Finally, we permuted the WTCCC data sets. By permuting on a GPU cluster consisting of 40 nodes, we completed 10¹² permutations for all 280 SNP pairs reported with p-values smaller than 10^-12 in the WTCCC data sets in 1 week. We found two pairs whose permutation test p-values were larger than the significance threshold. Both these pairs have cells in their contingency tables with a zero expected count. Meanwhile, we verified that there is no significant error in the likelihood ratio test p-values of the other SNP pairs under the resolution of 10^-12. This result is consistent with the experiments on simulation data sets. When the eligibility condition of a likelihood ratio test is violated, we should use a permutation test instead.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Electronic and Computer Engineering Authors Yang, Guangyuan Subjects Single nucleotide polymorphisms Testing Data processing Language English Call number Thesis ECED 2014 YangG DOI 10.14711/thesis-b1432208

Full record

Permutation tests for genome-wide association studies

by Guangyuan Yang

Post a Comment Cancel reply