1 / 21

Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems

Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems. Algorithmica(2003) Jens Gramm, Rolf Niedermeier, Peter Rossmanith. Outline. Introduction Preliminaries Linear-Time solution for constant d Related Problems Linear-Time solution for fixed k Conclusion.

jens
Télécharger la présentation

Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems Algorithmica(2003) Jens Gramm, Rolf Niedermeier, Peter Rossmanith

  2. Outline • Introduction • Preliminaries • Linear-Time solution for constant d • Related Problems • Linear-Time solution for fixed k • Conclusion

  3. Intro : Problem Definition • Input: String s1, s2, …, sk over alphabet Σ of length L each, and a nonnegative integer d. • Question: Is there a string s of length L such that dH(s, si)≤d for all i=1,…,k • dH(s1, s2) = |{i|s1[i]≠s2[i]}|, |s1|=|s2|

  4. NP-completeness • CLOSEST STRING is NP-complete • d is usually small in biological applications • O(kL+kd*dd) result in this paper • PTAS by Li et al

  5. Extended problems • d-MISMATCH • DISTINGUISHING STRING SELECTION • DISTINGUISHING SUBSTRING SELECTION

  6. Preliminaries • Given a set of string S={s1,…,sk}, each of length L • s is optimal center string iff no s’ such that maxi=1,…,kdH(s’,si)<maxi=1,…,kdH(s,si) • s is optimal median string iff no s’ such that Σi=1,…,kdH(s’,si)<Σi=1,…,kdH(s,si)

  7. Given a set of k strings of length L, think of this string as k x L matrix • Optimal median string : • a c c a

  8. Main idea • Search! • Fixed-parameter tractibility • Reduction to problem kernel

  9. LEMMA 1. Given a set of strings S={s1,…,sk}, each of length L, and a permutationσ:{1,…,L}{1,…,L}. Then s is an optimal center string for {s1,…,sk} iff σ(s) is an optimal center string for {σ(s1), σ(s2), …, σ(sk)}

  10. LEMMA 2. To compute an optimal center string, it is sufficient to solve a normalized and reordered instance. From this, the solution of the original instance can be derived in linear time

  11. LEMMA 3. A CLOSEST STRING instance with arbitrary alphabet Σ, |Σ|>k, isomorphic to a CLOSEST STRING instance with alphabet Σ’, |Σ’|=k. • By normalization

  12. LEMMA 4. Given a CLOSTEST STRING instance s1,…,sk of length L and d. If the resulting k x L matrix has more than kd dirty dirty columns, then there is no string s with maxi=1,…,kdH(s,si)≤d • A column is dirty iff it contains at least two different symbols from alphabet Σ • By pigeon theorem

  13. A Linear-Time solution for constant d • Bounded search tree algorithm • LEMMA 5. Given a set of strings S={s1,…,sk} and a positive integer d. If there are i, j {1,…,k} with dH{si,sj}>2d, then there is no string s with maxi=1,…,kdH(s, si)≤d

  14. Theorem 1. Given a set of string S={s1,…,sk} and d, Algorithm D determines in O(kL+kd*dd) time. • By lemma 4, reduced the input instance to O(kd) in O(kL) time • Depth=d, Time(D0+D1+D2+D3)=kd by building a table containing the distances of candidate s1 to all other given strings

  15. correctness • Show only the correctness of first step • If s1 is not a solution but there exists a center string s • P :={p|s1[p]≠si[p]}, |P|=d+1 • Ps1≠s=si := {p|s1[p]≠s[p]=si[p]}  goal! • Ps1≠s=si =Ps≠si∪ P (disjoint), |Ps≠si|≤d • So d+1 subcases is sufficient

  16. Related Problems • d-MISMATCH problem • Si,p,L denote the length L substring of a given string si starting at position p • Whether there is a string of length L and a position p with 1≤p≤n-L+1, such that dH(s,si,p,L)≤d, for all I • Stojanvoic et al give a linear time algorithm fo 1-MISMATCH • Theorem 2. d-MISMATCH is solvable in O(kL+(n-L)kd*dd) time which O(n*k) for fixed d • Naively: O(n*(KL+kd*dd)) • Maintain the queue of dirty columns • Considering only the first L columns, we can build a FIFO queue in O(kL) • Update at each position in O(k) time

  17. DSS problem • DISTINGUISHING STRING SELECTION • Given S={s1,…,sk1}, S’={s’1,…,s’k2} all of the same length L, and d1,d2≥0, is there a s such that • LEMMA 6. Given two set of strings S1={s1,…,sk1} and S2={s’1,…,s’k2} and positive d1,d2. If there are i{1,…,k1} and j{1,…k2} with dH(si,s’j)<L-(d1+d2), then there is no string s satisfying both maxi=1,…,k1dH(s,si)≤d1 and minj=1,…,k2dH(s,s’j)≥L-d2 • dH(s,s’j)≤dH(s,si)+dH(si,s’j)

  18. A Linear-Time Solution for Fixed k • Is CLOSEST STRING fixed parameter tractable? • Use integer linear programming (ILP) • Lenstra: ILP with a fixed number of variables can be solved in linear time(exponential space)

  19. CLOSEST STRING in ILP • Column types for k • For k=3: (a,a,a)t, (a,a,b)t, (a,b,a)t, (b,a,a)t, (a,b,c)t • |column types|=B(k)≤k! • Xt,φ, t: column type, φΣ • Number of column type t whose corresponding character in the desired solution string of CLOSEST STRING is set to φ • B(k)*k Variables needed • Minimize • Φt,i denates the alphabet symbol at the ith entry of column type t

  20. Conclusion • Fixed parameter tractability for CLOSEST STRING in d, k • Improve previous work in d-MISMATCH • DSS • CLOSEST SUBSTRING ?

More Related