Microsatellite Evolution in The Yeast Genome - A Genomic Approach
Type of content
Microsatellites are short (1-6bp long) highly polymorphic tandem repeats, found in all genomes analyzed so far. Popular genetic markers for many applications including population genetics, pedigree analysis, genetic mapping and linkage analysis, some microsatellites also can cause a variety of human neurodegenerative diseases and may act as agents of adaptive evolution through the regulation of gene expression. As a consequence of these diverse uses and functions, the mutational and evolutionary dynamics of microsatellite sequences have gained much attention in recent years. Mostly, the focus of studies investigating microsatellite evolution has been to develop more refined evolutionary models for estimating parameters such as genetic distance or linkage disequilibrium. However, there is an incentive in using our understanding of the evolutionary processes that affect these sequences to examine the functional implications of microsatellite evolution. What has emerged from nearly two decades of study are highly complex mutational dynamics, with mutation rates varying across species, loci and alleles, and a multitude of potential influences on these rates, most of which are not yet fully understood.
The increasing availability of whole genome sequences has immensely extended the scope for studying microsatellite evolution. For example, where once it was common to examine single loci, it is now possible to examine microsatellites using genome wide approaches. In the first part of my dissertation I discuss approaches and issues associated with detecting microsatellites in genomic data. In Chapter 2 I undertook a meta-analysis of studies investigating the distribution of microsatellites in yeast and showed that studies comparing the distribution of microsatellites in genomic data can be fraught due to the application of different definitions for microsatellites by different investigators. In particular, I found that variation in how investigators choose the repeat unit size of a microsatellite, handle imperfections in the array and especially the choice of minimum array length used, leads to a large divergence in results and can distort the conclusions drawn from such studies, particularly where inter-specific comparisons are being made. In a review of the currently available suite of bioinformatics tools (Chapter 3), I further showed that this bias extends beyond a solely theoretical controversy into a methodological issue because most software tools not only incorporate different definitions for the key parameters used to define microsatellites, but also employ different strategies to search and filter for microsatellites in genomic data. In this chapter I provide an overview of the available tools and a practical guide to help other researchers choose the appropriate tool for their research purpose.
In the second part of my thesis, I use the analytical framework developed from the previous chapters to explore the biological significance of microsatellites exploiting the well annotated genome of the model organism Saccharomyces cerevisiae (baker’s yeast). Several studies in different organisms have indicated spatial associations between microsatellites and individual genomic features, such as transposable elements, recombinational hotspots, GC-content or local substitution rate. In Chapter 4, I summarized these studies and tested some of the underlying hypotheses on microsatellite distribution in the yeast genome using Generalized Linear Models (GLM) and wavelet transformation. I found that microsatellite type and distribution within the genome is strongly governed by local sequence composition and negative selection in coding regions, and that microsatellite frequency is inversely correlated with SNP density reflecting the stabilizing effect point mutations have on microsatellites. Microsatellites may also be markers for recent genome modifications, due to their depletion in regions nearby LTR transposons, and elements of potential structural importance, since I found associations with features such as meiotic double strand breaks, regulatory sites and nucleosomes. Microsatellites are subject to local genomic influences, particularly on small (1-2kb) scales. Although, these local scale influences might not be as dominant as other factors on a genome-wide scale they are certainly of importance with respect to individual loci.
Analysis of locus conservation across 40 related yeast strains (Chapter 5) showed no bias in the type of microsatellites conserved, only a negative influence of coding sequences, which supports again the idea that microsatellites evolve neutrally. Polymorphism was rare, and despite a positive correlation with array length, there was no relationship with either genomic fraction or repeat size. However, the analysis also revealed a non-random distribution of microsatellites in genes of functionally distinct groups. For example, conserved microsatellites (similar to general microsatellites in yeast) are mostly found in genes associated with the regulation of biological and cellular processes. Polymorphic loci show further an association with the organization and biogenesis of cellular components, morphogenesis, development of anatomical structures and pheromone response, which, is absent for monomorphic loci. Whether this distribution is an indication of functionality or simply neutral mutation (e.g. genetic hitch-hiking) is debatable since most conserved microsatellites, particularly variable loci, are located within genes that show low selective constraints. Overall, microsatellites appear as neutrally evolving sequences, but owing to the sheer number of loci within a single genome, individual loci may well acquire some functionality. More work is definitely needed in this area, particularly experimental studies, such as reporter-gene expression assays, to confirm phenotypic effects.