1 / 48

A phylogenetic application of the combinatorial graph Laplacian

A phylogenetic application of the combinatorial graph Laplacian. Eric A. Stone Department of Statistics Bioinformatics Research Center North Carolina State University. My motivation for this project. Trees in statistics or biology

mary
Télécharger la présentation

A phylogenetic application of the combinatorial graph Laplacian

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A phylogenetic application of the combinatorial graph Laplacian Eric A. Stone Department of Statistics Bioinformatics Research Center North Carolina State University

  2. My motivation for this project • Trees in statistics or biology • Often a latent branching structure relating some observed data • Trees in mathematics • Always a connected graph with no cycles

  3. My motivation for this project • Trees in statistics or biology • PROBLEM: Recover properties of latent branching structure • Trees in mathematics • Always a connected graph with no cycles

  4. My motivation for this project • Trees in statistics or biology • PROBLEM: Recover properties of latent branching structure • Trees in mathematics • Characterization of observed structure by spectral graph theory

  5. My motivation for this project • Trees in statistics or biology • PROBLEM: Recover properties of latent branching structure • Trees in mathematics • Characterization of observed structure by spectral graph theory

  6. Bridging the gap • Rectifying trees and trees • Can we use some powerful tools of spectral graph theory to recover latent structure? • Natural relationship between trees and complete graphs?!?

  7. Tree and distance matrices • The tree with vertex set {1,…,8} has distance matrix D • The “phylogenetic tree” can only be observed at {1,…,5} • We can only observe (estimate) the phylogenetic portion D* The phylogenetic portion D*

  8. More motivation for this project • Trees in statistics or biology • PROBLEM: Recover properties of latent branching structure • Given D* only, recover latent branching structure • This is the problem of phylogenetic reconstruction (w/o error!) The phylogenetic portion D*

  9. NJ finds (2,n-2) splits from D* • A split is a bipartition of the leaf set (e.g. {1,2,3,4,5}) that can be induced by cutting a branch on the tree • e.g. {{1,2},{3,4,5}} or {{1,2,5},{3,4}} • Neighbor-joining criterion identifies (2,n-2) splits through {{1,2},{3,4,5}} {{1,2,5},{3,4}}

  10. A recipe for tree reconstruction from D* • Find a split • NJ relies on theorem that guarantees (2,n-2) split from Q matrix • Use knowledge of split to reduce dimension • NJ prunes the cherry (neighboring taxa) to reduce leaves by one • Iterate until tree has been fully reconstructed • Tree topology specified by its split set

  11. Our narrow goal • Find a split • NJ relies on theorem that guarantees (2,n-2) split from Q matrix • Hypothesize criterion that identifies deeper splits … and prove that it actually works

  12. Our solution The phylogenetic portion D*

  13. Our solution The phylogenetic portion D* • Let H be the centering matrix: • Find eigenvector Y of HD*H with the smallest eigenvalue • The signs of the entries of Y identify a split of the tree

  14. About the matrix HD*H • Entries of HD*H are Dij – Di. – D.j + D.. • HD*H is negative semidefinite • Zero is a simple eigenvalue with unit eigenvector • Entries of remaining eigenvalues have both + and - entries • HD*H appears prominently in: • Multidimensional scaling • Principal coordinate analysis

  15. Example of our solution • Find eigenvector Y of HD*H with the smallest eigenvalue: • Signs of Y identify the split {{1,2},{3,4,5}} -0.0564 +0.5793 -0.5011 -0.4636 +0.4418

  16. A real example (data from ToL) • Two iterations

  17. Our solution • Find a split • NJ relies on theorem that guarantees (2,n-2) split from Q matrix • Hypothesize criterion that identifies deep splits … and prove that it actually works

  18. Affinity and distance • In phylogenetics, common to consider pairwise distances • In graph theory, common to consider pairwise affinities Affinity-based Distance-based

  19. Distance matrix  Laplacian matrix

  20. The genius of Miroslav Fiedler • G connected  smallest eigenvalue of L, zero, is simple • Smallest positive eigenvalue, , called algebraic connectivity of G • Fiedler vectors Y satisfy LY=Y • Fiedler cut is the sign-induced bipartition -0.4277 -0.0223 +0.4840 -0.0158 -0.3653 +0.3449 +0.4038 -0.4047

  21. The genius of Miroslav Fiedler • G connected  smallest eigenvalue of L, zero, is simple • Smallest positive eigenvalue, , called algebraic connectivity of G • Fiedler vectors Y satisfy LY=Y • Fiedler cut is the sign-induced bipartition • Fiedler cut here is • {{1,2,6},{3,4,5,7,8}} • Note that the cut implies a leaf split: • {{1,2},{3,4,5}} -0.4277 -0.0223 +0.4840 -0.0158 -0.3653 +0.3449 +0.4038 -0.4047

  22. Is this relevant here? • We do not observe an 8x8 Laplacian matrix L • All we get is a 5x5 matrix of between-leaf pairwise distances D* • Where is the connection to graph theory? The phylogenetic portion D*

  23. Recall: Our solution • Let H be the centering matrix: • Find eigenvector Y of HD*H with the smallest eigenvalue • The signs of the entries of Y identify a split of the tree The phylogenetic portion D*

  24. An extremely useful relationship • Recall the centering matrix H • The (Moore-Penrose) pseudoinverse of HDH is in fact -2L • We have shown in the context of this formula • Principal submatrices of D relate to Schur complements of L • In particular, (HD*H)+ = -2L* = -2(L/Z) = -2(W – XZTY), where W X Y Z

  25. Recall: Our solution • Find eigenvector Y of HD*H with the smallest eigenvalue • The signs of the entries of Y identify a split of the tree • The smallest eigenvalue of HD*H (negative semidefinite) is the smallest positive eigenvalue of L* • In fact, L* can be seen as a graph Laplacian • And our solution, Y, is the Fiedler vector of that graph! • But what does this graph look like?

  26. Schur complementation of a vertex • The vertices adjacent to 8 become adjacent to each other

  27. Schur complementation of the interior • The graph described by L* is fully connected • All cuts yield connected subgraphs  No help from Fiedler

  28. Recap thus far • Given matrix D* of pairwise distances between leaves • Find eigenvector Y of HD*H with the smallest eigenvalue • Claim: The signs of the entries of Y identify a split of the tree • Y shown to be a Fiedler vector of the Laplacian L* • But graph of L* is fully connected, has no apparent structure • Thus Fiedler says nothing about signs of entries of Y • But claim requires signs to be consistent with structure of the tree

  29. Recap thus far • Thus Fiedler says nothing about signs of entries of Y • But claim requires signs to be consistent with structure of the tree • How does L* inherit the structure of the tree? NO NO YES

  30. The quotient rule inspires a “Schur tower”

  31. The quotient rule inspires a “Schur tower” • How does this help?

  32. Cutpoints and connected components • A point of articulation (or cutpoint) is a point rG whose deletion yields a subgraph with 2 connected components • Cutpoints: 6,7,8 • Shown: {1}, {2}, {3,4,5,7,8} are connectedcomponents at 6 • The cutpoints of a tree are its internal nodes

  33. The key observation (i.e. theorem) • Let L be the Laplacian of a graph G with some cutpoint v • Let L{v} be the Laplacian of G{v} obtained by Schur complement at v • Then the Fiedler cut G{v} identifies a split of G • Here the Fiedler cut of G{6} is {{1,2,5,8},{3,4,7}} • Including 6 in {1,2,5,8} defines two connected components in G +0.0570 + - -0.4129 + +0.5828 +0.0380 + - ? -0.3439 G G{6} +0.4660 + -0.3870 -

  34. The quotient rule inspires a “Schur tower” L L* • How does this help? •  Look at Schur paths to graph with Laplacian L*

  35. The punch line • The graph with Laplacian L* can be obtained in three ways • The Fiedler cut of G{6,7,8} must split G{6,7}and G{6,8}and G{7,8}

  36. The punch line • The graph with Laplacian L* can be obtained in three ways • The Fiedler cut of G{6,7,8} must split G{6,7}and G{6,8}and G{7,8}

  37. Recall: Example • Find eigenvector Y of HD*H with the smallest eigenvalue: • Signs of Y identify the split {{1,2},{3,4,5}} -0.0564 +0.5793 -0.5011 -0.4636 +0.4418

  38. The punch line • The graph with Laplacian L* can be obtained in three ways • The Fiedler cut of G{6,7,8} must split G{6,7}and G{6,8}and G{7,8} • This implies that the cut splits the progenitor graph G! {{1,2,6},{3,4,5,7,8}}

  39. Our solution actually works • Let H be the centering matrix: • Find eigenvector Y of HD*H with the smallest eigenvalue • The signs of the entries of Y identify a split of the tree The phylogenetic portion D*

  40. A recipe for tree reconstruction • Find a split • NJ relies on theorem that guarantees (2,n-2) split from Q matrix • We have a theorem that guarantees splits from HD*H matrix • Use knowledge of split to reduce dimension • NJ prunes the cherry (neighboring taxa) to reduce leaves by one • We use a divisive method that reduces to pairs of subtrees • Iterate until tree has been fully reconstructed • Tree topology specified by its split set

  41. Reconstruction from the inside out

  42. Connections with Classical MDS and PCoA • Classical solution to multidimensional scaling • a.k.a. Principal coordinate analysis • Recipe for dimension reduction given distance matrix D: • Construct matrix A from Dentrywise: x  -x2/2 • Double centering: B = HAH • Find k largest eigenvalues i of B with corresponding eigenvectors Xi • Coordinates of point Pr given by row r of eigenvector entries  k = 1 with sqrt of tree distance equivalent to our approach

  43. Phylogenetic ordination • PCoA on sequence data with k = 3: • For appropriate distance, C1 (x-axis) guaranteed to split taxa at 0 • Our results support popular use of PCoA • Provided that the right distance is considered…

  44. Conclusion I • Natural connection between matrix of pairwise distances and the Laplacian of a complete graph

  45. Conclusion II • Structure of tree embedded in complete graph and recoverable via spectral theory • Notion of “Fiedler cut” extends concept to “Fiedler split” • Inheritance propagated through Schur tower NO NO YES

  46. Conclusion III • Results inspire fast divisive tree reconstruction method

  47. Conclusion IV • Provides guidance and justification for ordination approach

  48. Acknowledgements • Alex Griffing (NCSU Bioinformatics) • Carl Meyer (NCSU Math) • Amy Langville (CoC Math)

More Related