THESIS
2014
xiv, 138 pages : illustrations (some color) ; 30 cm
Abstract
To complement next-generation sequencing technologies, there is a pressing need
for efficient pre-sequencing capture methods with reduced costs and DNA
requirement. The Alu family of short interspersed nucleotide elements is the most
abundant type of transposable elements in the human genome, with over one million
Alu elements identified. We have made use of inter-Alu PCR with an enhanced range
of amplicons in conjunction with next-generation sequencing to generate an
Alu-anchored scan, or 'AluScan'. To illustrate the method, one pair of glioma DNA
was sequenced by means of AluScan. The over 10 Mb sequences obtained, derived
from more than 8,000 genes, revealed a highly reproducible capture of the genome.
In addition, 341 somatic indels and 274 somatic SNVs have been identified...[
Read more ]
To complement next-generation sequencing technologies, there is a pressing need
for efficient pre-sequencing capture methods with reduced costs and DNA
requirement. The Alu family of short interspersed nucleotide elements is the most
abundant type of transposable elements in the human genome, with over one million
Alu elements identified. We have made use of inter-Alu PCR with an enhanced range
of amplicons in conjunction with next-generation sequencing to generate an
Alu-anchored scan, or 'AluScan'. To illustrate the method, one pair of glioma DNA
was sequenced by means of AluScan. The over 10 Mb sequences obtained, derived
from more than 8,000 genes, revealed a highly reproducible capture of the genome.
In addition, 341 somatic indels and 274 somatic SNVs have been identified.
Therefore we suggested AluScan as a good alternative for accelerating the
understanding of genomic studies.
Meanwhile, an exploration of cancer genomics was also performed based on
Affymetrix microarray data. We examined the possible use of machine learning to
reveal associations between recurrent copy number variations (CNVs) and
predisposition to cancer. Recurrent focal constitutional CN-gains and CN-losses were
identified from both of the non-tumor and tumor blood cell of Caucasian and Korean
cohorts respectively. In both instances, highly significant differences were revealed
with respect to the CNV signatures identified by (a) Correlation-based Feature
Selection (CFS), (b) Frequency-based Selection, and (c) Classifier-based Selection.
The extensive discrimination between cancer-patient and normal person with the
average prediction accuracies of 93.6% and 86.5% indicated the possible
predisposition to cancer based on recurrent CNVs.
Inspired by the above findings, we had also tried to call CNV using AluScan data.
However, the special features of AluScan data have rendered them inaccessible to
analysis by most algorithms designed for calling copy number variation (CNV) based
on whole genome sequencing and exome-capture data, which require a paired
control sample to proceed. Accordingly, In the present study, an 'AluScanCNV'
method has been developed to call CNVs from AluScans, using a group of reference
samples to construct a reference template, a transformed distribution of the read-depth ratio between sequence windows on target sequence and reference
template to call local CNVs, a poisson binomial distribution to identify recurrent CNVs,
and sequential merging of windows to reveal large CNVs. Application of the method to
the AluScans for 21 non-cancerous and 39 cancer tissues led to the identification of
an average of 532 local CNVs with length of 500kb per AluScan, a total of 49 recurrent
CNVs with copy number gain and 65 recurrent CNVs with copy number loss in liver
cancer samples, and a total of 6 and 12 large CNVs including the well known deletions
on chromosomes 1 q and 19p in two glioma samples, and on chromosome 9 in one of
the gliomas. The AluScanCNV method was found to be very robust when applied to
CNV calling from AluScan data. In addition, it can be applied without loss of generality
to other next generartion sequencing data. Since the method does not require any
paired control, it can be employed as well to identify germline CNVs in normal
samples. The utility of the method for calling recurrent CNVs broadens extensively
the scope of systematic analysis of recurrent CNV regions.
The current work has established a standard analysis flow for AluScan data as well
as other targeted sequencing data. In addition, a simple statistical model for
calculating recurrent patterns was applied for the first time on next generation
sequence data, which could enable AluScan to be a quick and robust detection
method of germline and somatic CNVs in cancer patients. Based on the findings in
this study, these recurrent CNVs could be used as a complex signature to distinguish
cancer patient from normal person. The current study highlights the importance of
recurrent CNVs pattern in cancer blood sample which might shed light on a new
biomarker for cancer prognosis and early detection.
Post a Comment