THESIS
2022
1 online resource (xvi, 134 pages) : illustrations (chiefly color)
Abstract
The large-scale genome-wide association studies (GWASs) have detected tens of
thousands of risk variants underlying complex phenotypes. However, there are
still outstanding challenges that hamper the clinical translation and biological
interpretation of GWAS discoveries. In this thesis, we propose two statistical
methods to address challenges in these two directions.
In clinical applications, the development of polygenic-risk-scores (PRSs) has
proved useful to stratify the general population into different risk groups for
the European population. However, PRS is less accurate in non-European populations
due to genetic differences across different populations. The differences
in genetic architectures between populations arise from three aspects. First,
variants with biologically importan...[
Read more ]
The large-scale genome-wide association studies (GWASs) have detected tens of
thousands of risk variants underlying complex phenotypes. However, there are
still outstanding challenges that hamper the clinical translation and biological
interpretation of GWAS discoveries. In this thesis, we propose two statistical
methods to address challenges in these two directions.
In clinical applications, the development of polygenic-risk-scores (PRSs) has
proved useful to stratify the general population into different risk groups for
the European population. However, PRS is less accurate in non-European populations
due to genetic differences across different populations. The differences
in genetic architectures between populations arise from three aspects. First,
variants with biologically important roles in the non-European populations may
be neglected in GWAS if they are absent or have very low allele frequencies
in Europeans. Second, the same variant may have different effect sizes on the
same phenotype across different populations, limiting the extrapolation value
of GWAS findings and the GWAS-derived PRS power in non-discovery populations.
Third, the linkage disequilibrium (LD) patterns vary across populations, exacerbating the bias in extrapolating the PRS for risk prediction.
To improve the prediction accuracy in non-European populations, we propose a
cross-population analysis framework for PRS construction with both individual-level
(XPA) and summary-level (XPASS) GWAS data. By leveraging trans-ancestry
genetic correlation, our methods can borrow information from the Biobank-scale
European population data to improve risk prediction in the non-European
populations. Our framework can also incorporate population-specific effects to
further improve construction of PRS. With innovations in data structure and algorithm
design, our methods provide a substantial saving in computational time
and memory usage. Through comprehensive simulation studies, we show that
our framework provides accurate, efficient and robust PRS construction across
a range of genetic architectures. In a Chinese cohort, our methods achieved
7.3% − 198.0% accuracy gain for height and 19.5% − 313.3% accuracy gain for
body mass index (BMI) in terms of predictive R
2 compared to existing PRS
approaches, respectively. We also show that XPA and XPASS can achieve substantial
improvement for construction of height PRS in the African population,
suggesting the generality of our framework across global populations.
For a better biological interpretation of GWAS discoveries, integrative analysis
of multi-omics data have been conducted to implicate biological insights. By
leveraging existing GWAS and transcriptomic information, transcriptome-wide
association studies (TWAS) have achieved many successes in identifying trait-associations
of genetically-regulated expression (GREX) levels. TWAS analysis
relies on the shared GREX variation across GWAS and the reference transcriptomic
data, which depends on the cellular conditions of the transcriptomic data.
Considering the increasing availability of transcriptomic data from different conditions
and the often unknown trait-relevant cell/tissue-types, we propose a
method and tool, IGREX, for precisely quantifying the proportion of phenotypic variation attributed to the GREX component. IGREX takes as input
a reference transcriptomic panel and individual-level or summary-level GWAS
data. Using transcriptomic data of 48 tissue types from the GTEx project as
a reference panel, we evaluated the tissue-specific IGREX impact on a wide
spectrum of phenotypes. We observed strong GREX effects on immune-related
protein biomarkers. By incorporating trans expression quantitative trait loci
(trans-eQTL) and analyzing genetically-regulated alternative splicing events, we
evaluated new potential directions for TWAS analysis.
Post a Comment