The number of nucleotide sites needed to accurately reconstruct large evolutionary trees
Degree GrantorUniversity of Canterbury
Degree NameResearch report
Biologists seek to reconstruct evolutionary trees for increasing number of species, n, from aligned genetic sequences. How fast the sequence length N must grow, as a function of n, in order to accurately recover the underlying tree with probability 1 - ∊, if the sequences evolve according to simple stochastic models of nucleotide substitution? We show that for a certain model, a reconstruction method exists for which the sequence length N can grow surprisingly slowly with n (sublinearly for a wide range of parameters, and even as a power of log n in a narrow range, which roughly meets the lower bound from information theory). By contrast a more traditional technique (maximum compatibility) provably requires N to grow faster than linearly in n. Our approach is based on a new, and computationally efficient approach for reconstructing phylogenetic trees from aligned DNA sequences.
- Engineering: Reports