Selecting taxa to save or sequence : desirable criteria and a greedy solution
Three desirable properties for any method of selecting a subset of evolutionary units (EUs) for conservation or for genomic sequencing are discussed. These properties are: spread, stability, and applicability. We are motivated by a practical case in which the maximisation of phylogenetic diversity (PD), which has been suggested as a suitable method, appears to lead to counter-intuitive collections of EUs and does not meet these three criteria. We define a simple greedy algorithm (GREEDYMMD) as a close approximation to choosing the subset that maximises the minimum pairwise distance between EUs. This method of selection satisfies our three criteria and may be a useful alternative to PD in certain real world situations. We also show that if distances between EUs are ultrametric, then GREEDYMMD delivers an optimal subset of EUs that maximises both the minimum pairwise distance and the PD. Finally, since GREEDYMMD works with distances and does not require a tree, it is readily applicable to many datasets.
SubjectsField of Research::01 - Mathematical Sciences::0102 - Applied Mathematics::010202 - Biological Mathematics
- Engineering: Reports