Over the past two decades, genome-wide association studies (GWAS) have been
successful in identifying robust associations between single nucleotide polymorphisms
(SNPs) and the phenotypes of interest. The availability of large-scale
GWAS data for complex traits/diseases have laid the foundation for drug development
[1, 2] and precision medicine [3, 4]. Despite the great achievements,
Non-European populations are severely under-represented in GWASs. According
to the GWAS Diversity Monitor (https://gwasdiversitymonitor.com), about
88.9% of GWAS participants have been of European ancestry (EUR) to date.
The lack of ancestral diversity in GWAS leads to bias in the understanding of
the biology of complex traits [5, 6] and translating the GWAS findings into clinical
applications in the under-...[
Read more ]
Over the past two decades, genome-wide association studies (GWAS) have been
successful in identifying robust associations between single nucleotide polymorphisms
(SNPs) and the phenotypes of interest. The availability of large-scale
GWAS data for complex traits/diseases have laid the foundation for drug development
[1, 2] and precision medicine [3, 4]. Despite the great achievements,
Non-European populations are severely under-represented in GWASs. According
to the GWAS Diversity Monitor (https://gwasdiversitymonitor.com), about
88.9% of GWAS participants have been of European ancestry (EUR) to date.
The lack of ancestral diversity in GWAS leads to bias in the understanding of
the biology of complex traits [5, 6] and translating the GWAS findings into clinical
applications in the under-represented populations (e.g., East Asian, African,
etc.). In this thesis, we proposed two statistical methods to address challenges
in these two directions.
Given the fact that intensifying data-collection efforts are required to fill the
disparities gap of genetic studies between non-Europeans and European, recent
efforts have been devoted to improving the statistical power of GWASs in non-Europeans by applying trans-ancestry association mapping (TRAM) to integrate
GWAS datasets across multiple populations. However, The challenges of TRAM
arise from two major aspects. First, the genetic architectures of a phenotype
are heterogeneous across ancestries. Some trait-associated SNPs have vastly different
allele frequencies between European and non-European ancestries. SNP
effect sizes and linkage disequilibrium (LD) patterns can also vary across ancestries.
Second, the publicly released GWAS summary statistics still suffer from
confounding biases. Although principal component analysis (PCA) and linear
mixed models (LMMs) are commonly used for association mapping in GWASs,
the population stratification in biobank-scale data, such as socioeconomic status
or geographic structure, may not be fully accounted for in these standard
approaches. Without correcting confounding biases hidden in GWAS summary
statistics, TRAM will produce many false positive findings.
As increasing sample sizes from GWASs, polygenic risk scores (PRS) constructed
by a collective contribution of SNPs across the genome from GWAS has shown
great potential in personal and clinical utility for a number of heritable diseases,
including aid diagnosis by stratifying patients into different risk groups, early
and cost-effective interventions, and improved therapeutic strategies. However,
because of the aforementioned heterogeneous genetic architectures (e.g., allele
frequencies, effect sizes, and LD) across different populations, the PRS constructed
using European samples becomes less accurate when it is applied to
individuals from non-European populations. It is an urgent task to improve the
accuracy of PRSs in the under-represented populations.
In this thesis, we propose a statistical method, LOG-TRAM, to leverage the local
genetic architecture for TRAM. By using biobank-scale datasets, we showed that
LOG-TRAM can greatly improve the statistical power of identifying risk variants
in under-represented populations while producing well-calibrated p-values.
We applied LOG-TRAM to the GWAS summary statistics of various complex
traits/diseases from Biobank Japan, UK Biobank and African populations. We obtained substantial gains in power and achieved effective correction of confounding
biases in TRAM. We also showed that LOG-TRAM can be successfully applied
to identify ancestry-specific loci and the LOG-TRAM output can be further
used for construction of more accurate polygenic risk scores in under-represented
populations. Besides, to improve the accuracy of PRSs in trans-ancestry analysis,
we also propose a cross-population and cross-phenotype (XPXP) method
for construction of PRSs in under-represented populations. XPXP can construct
accurate PRSs by leveraging biobank-scale datasets in European populations
and multiple GWASs of genetically correlated phenotypes. XPXP also allows to
incorporate population-specific and phenotype-specific effects, and thus further
improves the accuracy of PRS. Through comprehensive simulation studies and
real data analysis, we demonstrated that our XPXP outperformed existing PRS
approaches. We showed that the height PRSs constructed by XPXP achieved
9% and 18% improvement over the runner-up method in terms of predicted R
2
in East Asian and African populations, respectively. We also showed that XPXP
substantially improved the stratification ability in identifying individuals at high
genetic risk of Type 2 Diabetes.
Post a Comment