THESIS
2005
xii, 43 leaves : ill. (some col.) ; 30 cm
Abstract
Genomes of several eukaryotic model organisms have now been finished, among them yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans), fly (Drosophila melanogasta), mustard weed (Arabidopsis thaliana), mouse (Mus musculus) and human (Homo sapiens). Although the sequencing error in finished genomes is estimated to be less than one in 10,000 nucleotides, this does not account for the errors that arise from the misassembly or cloning errors of the sequence. We propose that these assembly errors may be roughly estimated by computing the size of duplicated sequence found within genomes, and a preliminary estimation of this error is as much as 1% of the genome, i.e., two-order of magnitudes larger than the sequencing error. We will describe the computation of the duplicated sequenc...[
Read more ]
Genomes of several eukaryotic model organisms have now been finished, among them yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans), fly (Drosophila melanogasta), mustard weed (Arabidopsis thaliana), mouse (Mus musculus) and human (Homo sapiens). Although the sequencing error in finished genomes is estimated to be less than one in 10,000 nucleotides, this does not account for the errors that arise from the misassembly or cloning errors of the sequence. We propose that these assembly errors may be roughly estimated by computing the size of duplicated sequence found within genomes, and a preliminary estimation of this error is as much as 1% of the genome, i.e., two-order of magnitudes larger than the sequencing error. We will describe the computation of the duplicated sequence found in three of the finished model genomes - yeast, mustard weed and mouse - and present a selection of the many duplications found which we suspect to be assembly or cloning errors, some of them longer than 10,000 nucleotides. We verify experimentally that some very large duplications in the model worm genome are indeed assembly errors caused by the presence of chimeric clones, that is contiguous pieces of DNA used for sequencing that in fact derive from two separated locations in the worm genome. We also verify experimentally that there is an error caused by an insertion of a piece of foreign DNA which is about 300 nucleotides. The ultimate goal of our research is to algorithmically correct finished genome assembly and cloning errors to substantially improve their accuracy.
Post a Comment