140 likes | 466 Vues
Distance between tree topologies. Splits. F. E. {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH}. G. D. C. A. H. B. Each split represents a branch and there is a 1-1 correspondence between the tree topology and the list of all splits. Splits.
E N D
Splits F E {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} G D C A H B Each split represents a branch and there is a 1-1 correspondence between the tree topology and the list of all splits
Splits Splits, which correspond to external branches, are trivial (found in all tree topologies). {A}{BCDEFGH}, {B}{ACDEFGH}, {C}{ABDEFGH} Splits, which correspond to internal branches, are those which determine the topology. {AB}{CDEFGH}, {CD}{ABEFGH}, {ABCD}{EFGH}
Splits For an unrooted tree with n leaves, there are 2n-3 branches, n external branches and n-3 internal branches -> n-3 non trivial splits.
Shared internal branches H G F G F A A B E E H B C C D D
Internal branches exist in one tree but not in the other H Robinson-Foulds distance = 6 G F G F A A B E E H B C C D D
Robinson-Foulds distance • The distance was suggested in: Roubinson DF and Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci. 53:131-147. • For an unrooted tree with n taxa, the min distance is 0, the max is 2(n-3). • The distance ignores branch lengths. • Zero branches are not treated as multifurcations. • Note that the splits {R1}{R2} and {R2}{R1} are identical.
Kuhner-Felsenstein’s “BRANCH SCORE”.distance • The distance was suggested in: Kuhner MK and Felsenstein J (1994) A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11:459-468. • The motivation is to extend RF distance so that it accounts ALSO for differences in branch lengths. • The distance was used to evaluate performance of ML, NJ, and MP in simulations (distance between inferred tree and “true” tree).
Branch-Score (Bs) distance Bs = If a branch is found in both tree (shared split) – its contribution to the distance is the square of the differences between the branches’ lengths in the two trees. If a branch is found only in one tree – it is considered that a branch of length 0 exist in the other tree B A A C yb xa ya xc xab yac yc yd xb xd B D C D
Branch-Score (Bs) distance Bs extends RF if one replaces all branch lengths to equal 1 B A A C yb xa ya xc xab yac yc yd xb xd B D C D
Another look at the Bs distance Consider an array of all possible splits for n taxa. (B1,B2,…..,BN). Each tree can be represented by such an array, in which Bi = 0, if the split is not found in the tree, and the length of the relevant branch if the split is found. Bs distance between (B1,B2,…..,BN) and (B1’,B2’,…..,BN’) becomes Bs distance is the square Euclidean distance, and hence it is a distance (e.g., the triangle inequality holds).
Are these distances true distances Formally, a distance must have 3 properties: D(a,a)=0 for all a. D(a,b)=D(b,a) for all a,b (symmetry). D(a,c)<=D(a,b)+D(b,c) for all a,b,c (The triangle inequality). Bs distance is the square Euclidean distance, and hence it is a distance (e.g., the triangle inequality holds).