## Stochastic speciation models for evolutionary trees.

##### View/Open

##### Author

##### Date

2000##### Permanent Link

http://hdl.handle.net/10092/5597##### Thesis Discipline

Mathematics##### Degree Grantor

University of Canterbury##### Degree Level

Doctoral##### Degree Name

Doctor of PhilosophyPhylogenetic trees are widely used in biology to represent evolutionary relationships between species. As the details of the evolutionary process are mostly unknown, modelling work on the shapes of these trees has had to incorporate a random component. Two null models introduced for this purpose are the uniform model and the Yule model. A third model, the comb model, is useful for giving bounds on theoretical results. We investigate some mathematical properties of these three models. Let the distance between two nodes be the number of edges separating them. We find exact formulae for the mean distance of a randomly chosen leaf from the root, and for the mean distance between two randomly chosen leaves of a rooted tree. In addition, for the Yule model we find the probability distribution for the distance of randomly chosen leaf from the root. A cherry is a pair of leaves which are adjacent to a common node. By realising the process of cherry formation by extended Polya urn models we show that the number of cherries is asymptotically normal. This allows us to develop simple statistical tests for the Yule and uniform null hypotheses for the growth of rooted trees. A triplet is a cherry and a pendant edge that are adjacent to a common node. We also show that the asymptotic distribution of triplets is normal for the Yule model, and put forward a conjecture for the distribution under the uniform model. The construction of an evolutionary tree is generally a two stage process: an unrooted tree is constructed, then it is rooted. We investigate a method for rooting a tree based on the shape of the tree and the Yule model for the growth of rooted trees. We show that even for trees with large number of leaves the approximate location of the root can be located with high probability. Let S be a set of two rooted binary trees for which the leaf sets L1, L2 form a partition of the set {l, 2, ... ,n}. We derive a recursion for the number of trees on n leaves that are compatible with the set S. We extend this recursion for a set S of three trees, but show that the numbers of terms required in the recursion grows at least exponentially with the number of trees in the set S. Let S be a set of rooted binary trees. A tree which is a sub-tree of each of the trees in the set is called an agreement sub-tree, and such a tree with the maximum number of possible leaves is called a maximum agreement sub-tree (MAST). We derive an upper bound for the probability that two randomly generated trees have a MAST with number of leaves greater than or equal to a given value s. We find the form the upper bound takes when the trees are generated according to the uniform and Yule models. The entropy of a probability distribution is equal to the mean information, where the information of an event E is - log P (E). We derive exact and asymptotic formulae for the entropy of the comb, uniform and Yule probability distributions. We show that the comb, uniform, and Yule models satisfy a property called group elimination. A special case of the property of group elimination is sampling consistency. We show that for any probability distribution on trees that satisfies sampling consistency there is an upper bound on the probability of the fully symmetric tree shapes. We introduce a modification of the Yule model in which the speciation rate is a function of the time since the last speciation event of a lineage. Using analytical methods we investigate the probability (conditional and unconditional) of the symmetric tree on four leaves under this modified model. If the speciation rate is constant then the probability of the symmetric tree is the same as in the Yule model. Making the speciation rate zero for a period after a speciation event, then constant afterwards, is found to make the symmetric tree more probable. If the speciation rate is constant for some period after a speciation event, then subsequently zero, the symmetric tree is found to be less probable.