A central limit theorem for parsimony length of trees
Type of content
UC permalink
Publisher's DOI/URI
Thesis discipline
Degree name
Publisher
Journal Title
Journal ISSN
Volume Title
Language
Date
Authors
Abstract
In phylogenetic analysis it is useful to study the distribution of parsimony length of a tree, under the null model by which the leaves are independently assigned letters according to prescribed probabilities. Except in one special case, this distribution is difficult.to describe exactly. Here we analyse this distribution by providing a recursive and readily computable description, establishing large deviation bounds for the parsimony length of a fixed tree on a single site and for the minimum length (maximum parsimony) tree over several sites, and by showing that, under very general conditions, the former distribution converges asymptotically to the normal, thereby settling a recent conjecture. Furthermore, we show how the mean and variance of this distribution can be efficiently calculated. The proof of normality requires a number of new and recent results, as the parsimony length is not directly expressible as a sum of independent random variables, and so normality does not follow immediately from a standard central limit theorem.