THESIS
2017
xi, 51 pages : illustrations ; 30 cm
Abstract
In genome-wide association studies (GWAS), detecting interactions among single nucleotide
polymorphism (SNP) pairs and phenotypes is important to reveal the relationship
between genotypes and genetic diseases. The most commonly used measurement
for interactions is the departure from a linear model, which describes the statistical relationship
between genotypes and phenotypes. Recently, a Boolean operation-based
screening and testing (BOOST) method was proposed to detect interactions with log-linear
models. As the interaction detection is parallel, a GPU-based implementation
of the BOOST method, named GBOOST, was made available for acceleration. Neither
BOOST nor GBOOST methods take covariates into consideration in their models, which
may lead to inaccurate or even wrong interac...[
Read more ]
In genome-wide association studies (GWAS), detecting interactions among single nucleotide
polymorphism (SNP) pairs and phenotypes is important to reveal the relationship
between genotypes and genetic diseases. The most commonly used measurement
for interactions is the departure from a linear model, which describes the statistical relationship
between genotypes and phenotypes. Recently, a Boolean operation-based
screening and testing (BOOST) method was proposed to detect interactions with log-linear
models. As the interaction detection is parallel, a GPU-based implementation
of the BOOST method, named GBOOST, was made available for acceleration. Neither
BOOST nor GBOOST methods take covariates into consideration in their models, which
may lead to inaccurate or even wrong interaction results under some circumstances.
In the thesis, two covariate-adjusted interaction detection tools, (BOOST 2.0 and
GBOOST 2.0,) will be presented. BOOST 2.0 is a CPU multi-threaded version of the
advanced method, and GBOOST 2.0 is a GPU-based implementation. We will introduce
the log-linear models and the solutions to the maximum log-likelihood of the models
used in the method. Then the CPU multi-threaded and GPU implementations will
be illustrated. BOOST 2.0 and GBOOST 2.0 are both divided into four modules: data
loading, screening, testing and results mapping. In the data loading step, genetic data is
transformed into Boolean representation so that we can take advantage of the fast speed
of bit operation. Two fast approximate models are used in the screening step to filter
out SNP pairs with low interaction values. The screening step is the most computationally
intensive part since it exhaustively calculates interaction values for all SNP pairs.
Then we apply an iterative algorithm to calculate interaction values for the small portion
of SNP pairs, which have passed the screening step. Last, we map the significantly
interacted SNP pairs back to their positions on corresponding chromosomes.
The performance comparison of BOOST 2.0/GBOOST 2.0 with BOOST/GBOOST will be presented using simulated data. We will also demonstrate the discoveries on real
data with BOOST 2.0 and GBOOST 2.0.
Post a Comment