Engineering: Reports
http://hdl.handle.net/10092/611
Wed, 25 Nov 2015 03:19:09 GMT2015-11-25T03:19:09ZA central limit theorem for parsimony length of trees
http://hdl.handle.net/10092/11406
A central limit theorem for parsimony length of trees
Steel, M. A.; Goldstein, Larry J.; Waterman, Michael S.
In phylogenetic analysis it is useful to study the distribution of parsimony length of a tree, under the null model by which the leaves are independently assigned
letters according to prescribed probabilities. Except in one special case, this distribution
is difficult.to describe exactly. Here we analyse this distribution by providing a recursive
and readily computable description, establishing large deviation bounds for the parsimony
length of a fixed tree on a single site and for the minimum length (maximum parsimony)
tree over several sites, and by showing that, under very general conditions, the former
distribution converges asymptotically to the normal, thereby settling a recent conjecture.
Furthermore, we show how the mean and variance of this distribution can be efficiently
calculated. The proof of normality requires a number of new and recent results, as the
parsimony length is not directly expressible as a sum of independent random variables,
and so normality does not follow immediately from a standard central limit theorem.
Sat, 01 Jan 1994 00:00:00 GMThttp://hdl.handle.net/10092/114061994-01-01T00:00:00ZClassifying and counting linear phylogenetic invariants for the Jukes Cantor model
http://hdl.handle.net/10092/11405
Classifying and counting linear phylogenetic invariants for the Jukes Cantor model
Steel, M. A.; Fu, Y. X.
Linear invariants are useful tools for testing phylogenetic hypotheses from aligned DNA/RNA sequences, particularly when the sites
evolve at different rates. Here we give a simple, graph theoretic
classification, for each phylogenetic tree T, of its associated vector
space I(T) of linear invariants under the Jukes-Cantor one parameter
model of nucleotide substitution. We also provide an easilydescribed
basis for I(T), and show that if T is a binary (fully resolved)
phylogenetic tree with n sequences at its leaves then :
dim[I(T)] = 4ⁿ - F2n-2
where F n is the n-th Fibonacci number. Our method applies a
recently-developed Hadamard-matrix based technique to describe
elements of I(T) in terms of edge-disjoint packings of subtrees in
T, and thereby complements earlier more algebraic treatments.
Sat, 01 Jan 1994 00:00:00 GMThttp://hdl.handle.net/10092/114051994-01-01T00:00:00ZComparing forensic hypotheses from PCR results in cases involving mixtures of body fluids
http://hdl.handle.net/10092/11403
Comparing forensic hypotheses from PCR results in cases involving mixtures of body fluids
Steel, Mike; Taylor, Michael
Likelihood ratios provide a convenient and widely-used measure for assessing the relative support for forensic hypotheses given certain evidence. We extend earlier work
to provide techniques for computing these ratios when a crime sample provides a profile
of (multiple locus) genetic markers of mixed origin. Generic formulae are provided,
illustrated with an example, and some extensions are discussed briefly.
Sun, 01 Jan 1995 00:00:00 GMThttp://hdl.handle.net/10092/114031995-01-01T00:00:00ZThe number of nucleotide sites needed to accurately reconstruct large evolutionary trees
http://hdl.handle.net/10092/11402
The number of nucleotide sites needed to accurately reconstruct large evolutionary trees
Steel, M. A.; Székely, L. A.; Erdös, P. L.
Biologists seek to reconstruct evolutionary trees for increasing number of species, n, from aligned genetic sequences. How fast the sequence length N must grow, as
a function of n, in order to accurately recover the underlying tree with probability
1 - ∊, if the sequences evolve according to simple stochastic models of nucleotide
substitution? We show that for a certain model, a reconstruction method exists for
which the sequence length N can grow surprisingly slowly with n (sublinearly for
a wide range of parameters, and even as a power of log n in a narrow range, which
roughly meets the lower bound from information theory). By contrast a more traditional
technique (maximum compatibility) provably requires N to grow faster
than linearly in n. Our approach is based on a new, and computationally efficient
approach for reconstructing phylogenetic trees from aligned DNA sequences.
Mon, 01 Jan 1996 00:00:00 GMThttp://hdl.handle.net/10092/114021996-01-01T00:00:00Z