Coalescent experiments I: Unlabeled n-coalescent and the site frequency spectrum
We derive the transition structure of a Markovian lumping of Kingman’s n-coalescent [1, 2]. Lumping a Markov chain is meant in the sense of [3, def. 6.3.1]. The lumped Markov process, referred as the unlabeled n-coalescent, is a continuous-time Markov chain on the set of all integer partitions of the sample size n. We derive the backward-transition, forward-transition, state-specific, and sequence-specific probabilities of this chain. We show that the likelihood of any given site-frequency-spectrum (SFS), a commonly used statistics in genome scans, from a locus free of intra-locus recombination, can be directly obtained by integrating conditional realizations of the unlabeled n-coalescent. We develop a controlled Markov chain for importance sampling such integrals from an augmented unlabeled n-coalescent forward in time. We apply the methods to population-genetic data to conduct demographic inference at the empirical resolution of the site-frequency-spectra. We also extend a family of classical hypothesis tests of standard neutrality at a non-recombining locus based on any statistics of the SFS to a more powerful version that conditions on the topological information contained in the SFS. We formalize a graph of coalescent experiments to set a decision-theoretic stage for population genetic inference across different empirical resolutions.
SubjectsStatistical decision theory of population genetic experiments
- Engineering: Reports