Performance of supertree methods for estimating species trees (2010)
Type of ContentTheses / Dissertations
Degree NameMaster of Science
PublisherUniversity of Canterbury. Mathematics and Statistics
Phylogenetics is the research of ancestor-descendant relationships among different groups of organisms, for example, species or populations of interest. The datasets involved are usually sequence alignments of various subsets of taxa for various genes. A major task of phylogenetics is often to combine estimated gene trees from many loci sampled from the genes into an overall estimate species tree topology. Eventually, one can construct the tree of life that depicts the ancestor-descendant relationships for all known species around the world. If there is missing data or incomplete sampling in the datasets, then supertree methods can be used to assemble gene trees with different subsets of taxa into an estimated overall species tree topology.
In this study, we assume that gene tree discordance is solely due to incomplete lineage sorting under the multispecies coalescent model (Degnan and Rosenberg, 2009). If there is missing data or incomplete sampling in the datasets, then supertree methods can be used to assemble gene trees with different subsets of taxa into an estimated species tree topology. In addition, we examine the performance of the most commonly used supertree method (Wilkinson et al., 2009), namely matrix representation with parsimony (MRP), to explore its statistical properties in this setting. In particular, we show that MRP is not statistically consistent. That is, an estimated species tree topology other than the true species tree topology is more likely to be returned by MRP as the number of gene trees increases. For some situations, using longer branch lengths, randomly deleting taxa or even introducing mutation can improve the performance of MRP so that the matching species tree topology is recovered more often.
In conclusion, MRP is a supertree method that is able to handle large amounts of conflict in the input gene trees. However, MRP is not statistically consistent, when using gene trees arise from the multispecies coalescent model to estimate species trees.
Keywordsphylogenetics; matrix representation with parsimony (MRP); computational statistics; gene tree; species tree; supertree method; simulation study; incomplete lineage sorting; multispecies coalescent model; statistically consistent; expected parsimony score; pruning schemes; mutation model
RightsCopyright Yuancheng Wang
Showing items related by title, author, creator and subject.
Fischer, Mareike (University of Canterbury. Mathematics and Statistics, 2009)In evolutionary biology, genetic sequences carry with them a trace of the underlying tree that describes their evolution from a common ancestral sequence. Inferring this underlying tree is challenging. We investigate some ...
Semple, C.; Steel, M. (University of Canterbury. Mathematics and Statistics., 2000)The amalgamation of leaf-labelled (phylogenetic) trees on overlapping leaf sets into one (super)tree is a central problem in several areas of classification, particularly evolutionary biology. In this paper, we describe ...
Gernhard, T.; Ford, D.; Vos, R.; Steel, M. (University of Canterbury. Mathematics and Statistics., 2006)The reconstruction of large phylogenetic trees from data that violates clocklike evolution (or as a supertree constructed from any m input trees) raises a difficult question for biologists–how can one assign relative dates ...