1 / 26

Sorting by reversals

Sorting by reversals. Bogdan Pasaniuc Dept. of Computer Science & Engineering. Overview. Biological background Definitions Unsigned Permutations Approximation Algorithm Sorting Signed Permutations Simplified Algorithm. Mouse (X chrom.). What is the evolutionary path ?

ely
Télécharger la présentation

Sorting by reversals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

  2. Overview • Biological background • Definitions • Unsigned Permutations • Approximation Algorithm • Sorting Signed Permutations • Simplified Algorithm

  3. Mouse (X chrom.) • What is the evolutionary path ? • What is the ancestor chromosome? • Chromosomes  lists of genes  permutation Unknown ancestor Human (X chrom.)

  4. Mutation at chromosome level • Inversion (1 2 3 4 5 6 7)  (1 4 3 2 5 6 7) • Transposition (1 2 3 4 5 6 7)  (1 5 6 2 3 4 7) • Translocation (1 2 3 4 5 6 7)  (1 2 3 4 5 2 3 4 6 7) • Inversions • Known as reversals • The most common • Most often reflect the differences between and within species • What is the minimum number of reversals required to transform one perm. into another? • Reversal distance  good approx. for evolutionary distance

  5. 1 2 3 9 10 8 4 7 5 6 Reversals Genes (blocks) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

  6. Reversals 1 2 3 9 10 8 4 7 5 6 1, 2, 3, 8, 7, 6, 5, 4, 9, 10

  7. Reversals 1 2 3 9 10 8 4 7 Breakpoints 5 6 1, 2, 3, 8, 7, 6, 5, 4, 9, 10

  8. Breakpointa pair of adjacent positions (i,i+1) s. t. | i - i+1| ≠ 1 • The values i i+1are not consecutive • If | i - i+1| = 1 then the values i i+1are adjacent • Introduce 0= 0 , n+1 = n+1 •  (0,1) breakpoint if 1≠ 1 •  (n,n+1) breakpoint if n≠ n • A reversal affects the breakpoints only at its endpoints Any reversal can remove or induce at most 2 bkpts.

  9. Strip A maximal run of increasing (decreasing) elements. • Identity permutation has no breakpoints and any other permutation has at least one breakpoint • Greedy at each step remove the maximum number of breakpoints. • Ф() = number of breakpoints in  • While(Ф() > 0) • Choose a reversal that removes the maximum number of breakpoints. (if there is a tie favor the reversal that leaves a decreasing strip) • Greedy ends in at most Ф() steps.

  10. Quality of approximation Lemma1:Every permutation with a decreasing strip has a reversal that removes one breakpoint. Proof: consider the decreasing strip with i being the smallest  i -1 must be in an increasing strip that lies to the left or right Breakpoint that will be removed

  11. Lemma2:  has a decreasing strip. If every reversal that removes one bkpt leaves a permutation with no decreasing strips   has a reversal that removes two bkpts. Proof: • consider the decreasing strip with i being the smallest  increasing strip must be to the left. i • consider the decreasing strip with jbeing the largest  decreasing strip containing j+1must be to the right. j

  12. Fact 1: i andj must overlap • j must lie in i  if it doesn’t then oi has the decreasing strip that contains j • i must lie in j if it doesn’t then ojhas the decreasing strip that contains i

  13. Fact 2. i =j If i -j ≠ 0 then - if i -j contains an increasing strip  ojhas a decreasing strip - if i -j contains an decreasing strip  oihas a decreasing strip • Then =i = removes 2 breakpoints.

  14. Lemma 3:Greedy solves a permutation with a decreasing strip in at most Ф() – 1 reversals • Obs:if i has no decreasing strip  at step i-1 the reversal removed 2 bkpts. •  we can use one reversal to create a decr. strip  exists a reversal that removes at least one bkpt • Theorem1: Greedy sorts every permutation in at most Ф() reversals. • If  has a decreasing strip  at most Ф() -1 reversals • If  has no decreasing strip  every reversal induces a decreasing strip  after one step we can apply lemma3  at most Ф() reversals

  15. Corollary:Greedy is a 2-approximation algorithm • Every reversal removes at most 2 bkpts • OPT() ≥ Ф() /2 ≥ Greedy() /2 •  Greedy() ≤ 2* OPT() . • Runtime #of steps  O(n). At each step we need to analyze reversalsO(n2). Total runtime = O(n3).  analyze only reversals that remove bkpts  O(n2).

  16. Signed permutations: reversals change the sign: (1,2,3,4,5,6,7,8,9,10) (1,2,3,-8,-7,-6,-5,-4,9,10) Problem: Given a signed perm., find the minimum length series of reversals that transforms it into the identity perm.  polynomial algorithm (Hannenhalli&Pevzner ’95)  relies on several intermediary constructions  these constructions have been simplified  first completely elementary treatment of the problem (Bergeron ’05)

  17. Oriented pair  a pair of consecutive integers with different signs (0,3,1,6,5,-2,4,7)  o.p. (3,-2) and (1,-2). • o.p.  reversals that create consecutive integers (3,-2) : (0,3,1,6,5,-2,4,7)  (0,3,2,-5,-6,-1,4,7) (1,-2) : (0,3,1,6,5,-2,4,7)  (0,3,-5,-6,-1,-2,4,7) • Oriented reversal: reversal that creates consecutive integers • Score of a reversal: # of oriented pairs it creates.

  18. Algorithm1: As long as  has an oriented pair, choose the oriented reversal that has the maximal score.  output will be a permutation with positive elements.  0 and n+1 are positive;  if there is a negative element there exists an o.p. Claim1: If Alg1 applies k reversals to , yielding ’ then d() = d(’) + k.

  19. (0 2 5 4 3 6 1 7 ) (0 2 5 4 3 6 1 7 ) (0 2 5 4 3 6 1 7 ) Sorting positive perms.:  - signed perm. with positive elements - circular order: 0 successor of n+1.  - reduced if it does not contain consecutive elements. framed interval in  : i j+1 j+2 …j+k-1i+k s.t. i < j+1 j+2 … j+k-1 < i+k (0 2 5 4 3 6 1 7 ) hurdle a framed int. that contains no shorter framed int.

  20. Idea: create oriented pairs and then apply Algorithm1 Operations on Hurdles: • Hurdle Cutting:i j+1 j+2 …i+1…j+k-1i+k (0 1 4 3 2 5)  (0 -3 -4 -1 2 5) • Hurdle Merging: i … i+k … i’ … i’…i’+k’ (0 2 5 4 3 6 1 7) • Simple hurdle  if cutting it decreases the # of hurdles • Super hurdles  if cutting it increases the # of hurdles (0 2 5 4 3 -6 1 7 )

  21. Algorithm2:  has 2k hurdles  merge any two non-consecutive hurdles  has 2k+1 hurdles  cut one simple hurdle (if it has none merge any two non-consecutive) Claim2: Alg1 + Alg2 optimally sort any signed perm.

  22. Proof of claims: •  breakpoint graph • 1. each positive el x  2x-1,2x and each negative (-x)  2x,2x-1 (0 -1 3 5 4 6 -2 7) (0 2 1 5 6 9 10 7 8 11 12 4 3 13 ) arcs

  23. Arcs  oriented if they span an odd # of elements • Arc overlap graph: • Vertices -> arcs from breakpoint graph • Edges  arcs overlap

  24. Every oriented vertex corresponds to an oriented pair. • Fact2: Score of an oriented reversal (oriented vertex v) is T+U-O+1. • T= #oriented vertices. • U= #unoriented vertices adjacent to v • O= #oriented vertices adjacent to v • Oriented component  if it contains an oriented v • Safe reversal  does not create new unoriented components.

  25. Theorem (Hannenhalli&Pevzner). Any sequence of oriented safe reversals is optimal. • Theorem. An oriented reversal of maximal score is safe. •  claim1 holds. • Claim2 is proven in a similar manner.

  26. J. Kececioglu and D. Sankoff. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. 1995. • A. Bergeron. A very elementary presentation of the Hannenhalli-Pevzner Theory. 2005 • A. Caprara. Sorting by reversals is difficult. 1997 • S. Hannenhalli and Pavel Pevzner.Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. 1999

More Related