Trees and Ps and things that sneeze: Markov process models of site substitution (1997)
AuthorsTuffley, Christophershow all
An increasingly important tool in phylogenetics, a field which lies somewhere between mathematics and biology and seeks to deduce the evolutionary relationships between present day species, is the comparison of molecular sequences such as DNA and protein sequences. In making meaningful comparisons it is helpful to model the process by which the sequences came to differ. Many such models have at their heart certain Markov-style assumptions, since the "memoryless" feature of the Markov property seems appropriate to the site substitution process. This thesis looks at two problems related to the most basic Markov process model of site substitution, on which most more complicated (and hopefully more realistic) models are based, and takes a first look at a recently suggested model of Fitch and Markowitz's 1970 "covarion" hypothesis, comparing the covarion model with models of the better known rates-across-sites hypothesis. We show that the LogDet transformation, which under mild conditions allows tree reconstruction under the basic model, is in a sense unique, in that ø ꞊ log det is the only continuous homomorphism from n x n stochastic matrices with positive determinant into the real numbers under addition, up to scalar multiples. This result limits the form of possible alternatives to the LogDet transformation that might weaken the conditions under which it is valid. We introduce the reconstruction quotient and prove two structure theorems for it in the case of the very simple two state fully symmetric model. The reconstruction quotient is obtained from a space of weighted trees by identifying trees that no reconstruction technique will be able to distinguish between. We show that under the two state fully symmetric model, the reconstruction quotient corresponding to a fixed tree is always contractible, and that the quotient obtained from the set of all four leaf binary trees is also contractible. Finally, we take the first analytic look at a model of the covarion hypothesis, an alternative approach to accounting for differing selective constraints to the competing idea of rates-across-sites. We calculate some of the basic quantities required for tree reconstruction under this model, and compare it with rates-across-sites, seeking to find conditions under which they can and cannot be distinguished.