Performance of supertree methods for estimating species trees
Degree GrantorUniversity of Canterbury
Degree NameMaster of Science
Phylogenetics is the research of ancestor-descendant relationships among different groups of organisms, for example, species or populations of interest. The datasets involved are usually sequence alignments of various subsets of taxa for various genes. A major task of phylogenetics is often to combine estimated gene trees from many loci sampled from the genes into an overall estimate species tree topology. Eventually, one can construct the tree of life that depicts the ancestor-descendant relationships for all known species around the world. If there is missing data or incomplete sampling in the datasets, then supertree methods can be used to assemble gene trees with different subsets of taxa into an estimated overall species tree topology.
In this study, we assume that gene tree discordance is solely due to incomplete lineage sorting under the multispecies coalescent model (Degnan and Rosenberg, 2009). If there is missing data or incomplete sampling in the datasets, then supertree methods can be used to assemble gene trees with different subsets of taxa into an estimated species tree topology. In addition, we examine the performance of the most commonly used supertree method (Wilkinson et al., 2009), namely matrix representation with parsimony (MRP), to explore its statistical properties in this setting. In particular, we show that MRP is not statistically consistent. That is, an estimated species tree topology other than the true species tree topology is more likely to be returned by MRP as the number of gene trees increases. For some situations, using longer branch lengths, randomly deleting taxa or even introducing mutation can improve the performance of MRP so that the matching species tree topology is recovered more often.
In conclusion, MRP is a supertree method that is able to handle large amounts of conflict in the input gene trees. However, MRP is not statistically consistent, when using gene trees arise from the multispecies coalescent model to estimate species trees.