140 likes | 229 Vues
Explore the minimum number of characters needed for accurate tree reconstruction using mathematical analysis and simulation studies. Discover how branch lengths influence Maximum Parsimony and other methods. Uncover the optimal values for character number and branch lengths.
E N D
Future Directions in Phylogenetic Methods and Models, 17 – 21 Dec 07 • How many characters are needed to reconstruct the true tree? Mareike Fischer and Mike Steel Mareike Fischer
The Problem • Given: Sequence of characters (e.g. DNA) • Wanted: Reconstruction of the ‘true’ tree • Solution: Maximum Parsimony, Maximum Likelihood, etc. • But: Is the sequence long enough for a reliable • reconstruction? Mareike Fischer
Previous Approaches • Churchill, von Haeseler, Navidi (1992) • 4 taxa scenario • Observations: • The probability of reconstructing the true tree increases with the length of the interior edge. • “Bringing the outer nodes closer to the central branch can increase [this probability] dramatically.” more characters Rec. Prob. int. edge Mareike Fischer
Previous Approaches • 2. Yang (1998) • 4 taxa scenario, interior edge ‘fixed’ at 5% of tree length • 5 different tree-shapes were investigated • Observations: The optimal length for the interior edge ranges between 0.015 and 0.025. ‘Farris Zone’: MP better • Rec. Prob. ‘Felsenstein Zone’: ML better • Tree length Mareike Fischer
Our Approach • Limitation: Most previous approaches are based on simulations. • Our approach: Mathematical analysis of influence of branch lengths on tree reconstruction. • We investigate MP first and consider other methods afterwards. Mareike Fischer
Already known Steel and Székely (2002): Here, the number k of characters needed to reconstruct the true tree grows at rate . y y x y y But what happens if we fix the ratio (y:=px), and then take the value of x that minimizes k? Mareike Fischer
Our Approach • Setting: 4 taxa, pending edges of length px (with p>1), short interior edge of length x, • 2-state symmetric model. px px x px px Mareike Fischer
Main Result For ‘reliable’ MP reconstruction: k grows at least at rate p2 For the optimal value of x, k grows at rate p2 Mareike Fischer
Idea of Proof: 1. Applying the CLT Note that the true tree T1 will be favored over T2if and only if Zk>0. Set Xi i.i.d., and . Then (by CLT) Mareike Fischer
Idea of Proof: 2. The Hadamard Representation Since the Xiare i.i.d., μk and σk depend only on k and the probabilities P(X1=1) and P(X1=-1). These probabilities can using the ‘Hadamard Representation’: can be used Thus, for fixed p, the ratio to find a value of x that minimizes k. (Here, θ=e-2x.) Note that P(X1=1) and P(X1=-1) only depend on x and p. Mareike Fischer
Summary and Extension • For MP, the number k of characters needed to reliably reconstruct the true tree grows at rate p2. • Can other methods do better (e.g. rate p)? • No! [Can be shown using the ‘Hellinger distance’.] Mareike Fischer
Outlook • Questions for future work: • What happens when you approach the ‘Felsenstein Zone’? • What happens in general with different tree shapes or more taxa? Mareike Fischer
Thanks… • … to my supervisor Mike Steel, • … to the Newton Institute for • organizing this great conference, • … to the Allan Wilson Centre • for financing my research, • … to YOU for listening or at least waking up early enough to read this message . Mareike Fischer
The only true tree… • … is a Christmas tree . • Merry Christmas! • (And it does not even require reconstruction!) Mareike Fischer