Common Intervals in Sequences, Trees, and Graphs
350 likes | 375 Vues
Explore the conservation of gene order in bacterial genomes, study gene clusters, and analyze common intervals in permutations and trees with formalization and algorithms.
Common Intervals in Sequences, Trees, and Graphs
E N D
Presentation Transcript
Common Intervals in Sequences,Trees, and Graphs Steffen Heber and Jiangtian Li
Genome Comparison of Bacteria Kim et al.,Nat. Biotechnol., 2004]
Gene Order & Function in Bacteria • Gene order in bacteria is weakly conserved.[Gene order is not conserved in bacterial evolution. Mushegian, Koonin; Trends Genet. 1996] • Some genes cluster together even in unrelated species. • Genes inside a cluster are functionally associated.[Conserved clusters of functionally related genes in two bacterial genomes. Tamames et al.; J Mol Evol. 1997]
Formalization of Gene Clusters Genomes: permutations π1, π2 ,…, πk Genes: numbers 1,…,n 1 2 3 4 5 6 7 8 π1 8 7 6 4 5 2 1 3 π2 3 1 2 5 8 7 6 4 π3 6 7 4 2 1 3 8 5 π4
Intervals • For permutation of [n] = {1, 2, …, n},an interval (=gene cluster) is a set{(i), (i+1), …, (j)} for 1 i < j n. • Any permutation of [n] has n(n-1)/2 intervals. 1 3 5 4 2 6 7
Common Intervals • For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset SÍ[n], iff S is interval in all i. • We say SCF . 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6
Common Intervals • For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset SÍ[n], iff S is interval in all i. • We say SCF . 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6
Common Intervals • For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset SÍ [n], iff S is interval in all i. • We say SCF . 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6
Lemma Let F = (0, 1, …, k-1) and c, d CF . • If c d then c d CF. 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6
Lemma Let F = (0, 1, …, k-1) and c, d CF . • If c d then c d CF. • We call c dreducible. 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6 irreducible reducible interval
Analysis • We have K n(n-1)/2 common intervals, and I<n irreducible intervals. • Find all K common intervals of k 2permutations of [n]:O(kn + K) time & O(n) space
Common Intervals of Trees Let T,T1,…,Tk be trees with vertex set [n]. Definition: • S Í [n] is interval of T iffT[S] connected, and |S|>1 • S Í [n] is common interval of T1,…,Tk, iffS is interval in all trees. • Tree intervals generalize intervals of permutations.
Miscellaneous 2 1 4 5 1 2 3 4 Example: common intervals of T1, T2: { [2], [3], [4], [5] } • (Common) Intervals in trees are induced subtrees. 3 5 T2 T1
Structure of Tree Intervals • Tree intervals have the Helly property, i.e. for any family of tree intervals (Ti)iÎI, the assumption TpÇ Tq¹Æ for every p,qÎI implies ÇiÎITi ¹Æ.
Extreme Cases n-vertex stars Sn-1# non-trivial induced subtrees: 2n-1-1
The Common Interval Graph • Given T = (T1,…,Tk) and corresponding common intervals CT. The common interval graph GT = (V,E) is the graph with V = CT E = {(c,d) | c,d Î CF, cÇd ¹Æ, c ¹ d}
Example 2 1 2 3 4 1 • V=[n], T=(Pn, Sn-1) • We have CT = { [2],[3],…,[n] }, GT = K(CT). 3 4 [2] [n] [3] [4] GT
Common Interval Graphs cont’d A graph is called chordal, if it does not contain an induced cycle Cn on n>3 vertices. Proposition: Common interval graphs of trees are chordal graphs.
Irreducible Common Intervals For a common interval c Î CT and a subset V Í CT we say that V generates c, iff • for each d Î V, d Ì c • c = Ud • GT[V] is connected. If there is no such V then c is irreducible. The irred. intervals generate all common intervals. 1 3 5 2 4 6 7
Finding Irreducible Intervals • We have K < 2n-1 common intervals, and I<n irreducible intervals. • Find all irreducible common intervals of k trees on n vertices:O(kn2) time & O(kn) space
Finding Irreducible Intervals • Irreducible intervals are minimal common intervals containing an adjacent vertex pair. x y x y l z m l z m m m l l y y z x z x
Graph Intervals G=(V,E), undirected, connected graph, V=[n] S Í V is interval (convex), iff the induced subgraph G[S] is connected, and includes every shortest path with end-vertices in S. 1 1 2 3 2 3 4 4 convex NOT!
Common Intervals of Graphs Let G=(G1,…,Gk) family of connected undirected graphs, with vertex set [n]. Definition: S Í [n] is common interval of G, iff S is interval in all graphs. • Graph intervals generalize tree intervals. 1 2 G0 G1 2 3 3 4 4 1
Some Differences • The union of convex sets is NOT always convex.
Some Differences • The common convex hull of an adjacent vertex pair is NOT always irreducible. 3 3 1 2 1 2 G1 G2
Finding Irreducible Graph Intervals Sketch: Given G=(G0, G1, …, Gk-1) For each edge (i,j)ÎEi* do S(i,j) :={i,j} For each (k,l)ÎS(i,j) Add vertices ‘between’ k and l to S(i,j) Remove reducible intervals
Extreme Cases Permutations (identical permutations): • C n(n-1)/2I < n Trees (identical star-trees): • C < 2n-1I < n Graphs (complete graphs): • C < 2nI n(n-1)/2
Example: InterDom Database of protein domain interactions. • Gene fusions • Protein-protein interactions (DIP & BIND) • Protein complexes (PDB)
Comparing Three Networks G : Gene fusion P : PDB B : BIND D : DIP
Irreducible Intervals size of irreducible interval
Biological Meaningful? regulator of chromosome condensation protein kinase PH domain RAS family domain ankyrin repeat