Topics in Algorithms

Topics in Algorithms

Télécharger la présentation

Topics in Algorithms

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

1. Topics in Algorithms Introduction to Computational Complexity Theory

2. Quiz • A set S of strings is given as below. • Find the shortest strings s (called superstring) of S that contains every element of S as a substring. • This quiz mimics DNA sequencing. (example) [Quiz] S={ate, half, lethal, alpha, alfalfa} S={TCTCTA,CAGTCT,CTCCAAA, GGCAA,TAAGCTCC,TTCTCTC, TCCAAATTCTA,CTTTCT,AACACCTT, CTCCGACC,TTCTATC,TCTATCTC, CTCTGTAACA, CAACAG} s’= atehalflethalphalfalfa s = lethalphalfalfate ate half lethal alpha alfalfa This example is from [Blum 94].

3. Issues in Computational complexity theory • Showing upper/lower bounds of computational resources required for solving a problem L. • Upper/lower bounds are described as functions of the length of an input. • Such bounds for • time, • (memory) space, • … • Structural complexity • among classes of problems • Example) P  NP  E  EXP, P  E

4. This talk’s main issues (1/2) • How to deal with hard (time-consuming) problems • What to do when we find a problem that looks hard. • Sometimes, we could not find any efficient (polynomial-time) algorithm to solve the problem. • (1) If the problem is not hard, someone can find it. • (2) If the problem is really hard, other smart people cannot find it either. (1) (2)

5. This talk’s main issues (2/2) • The previous quiz looks intractable to solve. • # possible solutions is 14!=14  13・・1=87,178,291,200. • However, it is not easy to say the problem is hard. • It is hard to find a needle in a haystack. • needle = efficient algorithm • It seems harder to say that there is no needle in a haystack. • You just might miss a needle in the haystack. No needle? Computational complexity theory provides an answer.

6. Key idea • We have two problems A and B. • Given input x, we would like to know if xA (xB). • Suppose A is efficiently transformed with f into B • such that aA iff f(a) B. • a: input of A, f: transformation (reduction), f(a): input of B. • This shows that B is harder than (or as hard as) A. • A is solvable if there is a way to solve B. x1  B ‘yes’ x2 B ‘no’ x3 A f(x3) B ‘yes’ x4 A f(x4) B ‘no’ algorithm B1

7. Overview • Intuitive explanation of hard (time-consuming) problems • Decision problems/Optimization problems • Polynomial time • Class P, Class NP • Reductions • NP-complete and NP-hard • Examples • Superstring problem • Reduction from Traveling salesman problem

8. Types of problems (1/2) • Computational problems roughly fall into two categories: • Decision problem (output: yes/no), • Optimization problem (output: solution with max./min. cost). • Decision problem L • input: • string x • output: • ‘yes’ if xL, • ‘no’ otherwise. • Example) L: positive odd numbers. • L={1, 3, 5,…} • x=3  ‘yes’ since xL, • x=4  ‘no’ since xL.

9. Types of problems (2/2) • Computational problems fall into two categories: • Decision problem (output: yes/no), • Optimization problem (output: solution with max./min. cost). • Optimization problem M • input: • string x • cost function f • output: • y such that f(y) is the maximum (or the minimum) • Example) maximize f(x,y)= 2x2y–xy2+3. • x=1 y=1, f(1,1)=4.

10. Examples of problems (1/6) • Euler cycle problem (ECP) • Decision problem • Input (instance): • A undirected graph G=(V,E). • Output: • ‘yes’ if there is a graph cycle which uses each edge in G exactly once, • ‘no’ otherwise. ‘yes’ ‘no’

11. Examples of problems (2/6) • Shortest superstring problem (SSP) • Decision problem • Input (instance): • A set of sequences S={s1, …sn} and an integer (threshold) l. • Output: • ‘yes’ if there is a string s such that, for all i, si is a substring of s and the length of s is at most l. • ‘no’ otherwise. ‘yes’ since this string contains any sequences and its length is less than 18. s1 = TACGA s2 = ACCC s3 = CTAAAG s4 = GAGC length: 18 TACGACCCTAAAGAGC TACGA ACCC CTAAAG GAGC length: 10 ‘no’

12. Examples of problems (3/6) • Shortest superstring problem (Min-SSP) • Optimization problem • Input (instance): • A set of sequences S={s1, …sn}. • Output: • The shortest string s such that, for all i, si is a substring of s. s1 = TACGA s2 = ACCC s3 = CTAAAG s4 = GAGC TACGACCCTAAAGAGC

13. Examples of problems (4/6) • Traveling salesman problem (TSP) • Decision problem • Input (instance): • n cities (nodes) with the cost of travel between each pair of them, and an integer (threshold) t. • Output: • ‘yes’ if there is a tour of visiting all the cities and returning to your starting point with cost at most t, • ‘no’ otherwise. 4 b a max. cost: 14 ‘yes’ since the cost of this tour is less than 14. 5 4 4 2 3 3 2 3 a b d c a d c ‘no’ max. cost: 10 3

14. Examples of problems (5/6) • Traveling salesman problem (Min-TSP) • Optimization problem • Input (instance): • n cities (nodes) with the cost of travel between each pair of them. • Output: • A tour of visiting all the cities and returning to your staring point with the smallest cost. 4 b a 4 2 3 3 5 4 2 a b d c a 3 d c 3

15. Examples of problems (6/6) • Satisfiability problem (SAT) • Decision problem • Input (instance): • ABoolean function f over variables x1,…,xn. • Each takes either true (1) or false (0). • Output: • ‘yes’ if there is a truth assignment of x1,…,xn that satisfies f. • ‘no’ otherwise. ‘yes’ since f = T (1) where f =  x1 (x1x2   x3 ) (x1x2x3  x4 )  (x2 x3 x4)  (x1x3) x1 = F (0), x2 = T (1), x3 = F (0), x4 = F (0).

16. Polynomial time • To simplify the notion of ‘hardness’, we use polynomial-time as the cut-off for efficiency. • polynomial p(n) • Function for some k 1 and ak,…,a0 : • p(n)=aknk+ ak – 1nk –1+・・・+・・・+a0 . • Key property of polynomials • Let p(n) + q(n) be polynomials. • The sum p(n) + q(n) is also polynomial. • A composite function q(p(n)) is also polynomial of n.

17. Turing machine • An abstract model of computers. • At each step, • based on • its current state and • the symbol indicated by the header, • the Turing machine changes • its internal state, • the symbol indicated by the header, and • a position of the header. B B 1 0 0 1 1 B B header s1 one step B B 1 1 0 1 1 B B header s2

18. Hierarchy in the Computational Theory Halting problem of Turing machines undecidable EXP decidable 2n intractable= exponential time Traveling salesman NP graph isomorphism tractable= polynomial time P nlogn sorting n: input size median n Based on a figure in http://www-imai.is.s.u-tokyo.ac.jp/~imai/lecture/quantum_complexity.pdf

19. Well-known classes of decision problems • P: a set of decision problems solvable by a deterministic Turing machine in polynomial time. • ECP P. • NP: a set of decision problems solvable by a non-deterministic Turing machine in polynomial time. • ECP, TSP, SSP, SAT NP. NP P

20. Example of class NP • TSP NP since • TSP is solvable in polynomial time by a non-deterministic Turing machine. • At each branch, one node is chosen non-deterministically. • We suppose that it is possible to select the best choice at each branch with the non-deterministic Turing machine. a Time b c d threshold: 14 c d d b c b 4 a b 5 b d c b c d 4 2 3 a a a a a a c d 3 16 16 12 14 12 16

21. a b d c a certificate Alternate definition of class NP • TSP NP since • TSP is a decision problem defined with a verifier A(x, y) over strings such that • a string y is with length smaller than |x|c where c is a constant, • A(x,y) is computable by a deterministic Turing machine in polynomial time of |x|+|y|. • A(x,y) is also computable by a deterministic Turing machine in polynomial time of |x|. • Such y is usually called a certificate for x. 4 verifier A(x, y) running in polynomial time a b threshold: 14 5 4 ‘yes’ 2 3 c d 3

22. Features of problems in NP (1/2) • The number of possible solutions grows exponentially with the size of inputs. • Example) SSP • Threshold: 12 S={half, alpha, alfalfa} halfalphalfalfa alphalfalfa alfalfahalfalpha half alpha alfalfa half alpha alfalfa half alpha alfalfa halfalfalpha alphalfalfahalf alfalfalphalf half alpha alfalfa half alpha alfalfa half alpha alfalfa

23. Features of problems in NP (2/2) • We can verify any instance in polynomial time where we have its certificate (a superstring). • Example) SSP • Threshold: 12 S={half, alpha, alfalfa} alphalfalfa half alpha alfalfa

24. Harder problems (1/3) • Suppose that • problems L1 and L2 are in NP. • C(x) denotes a certificate for x. verifier A1 ‘yes’ x1  L1, C(x1) x2 L1, y ‘no’ verifier A2 x3 L2, C(x3) ‘yes’ x4 L2, y ‘no’

25. Harder problems (2/3) • Suppose that • problems L1 and L2 are in NP, • C(x) denotes a certificate for x, • we construct this transformation called a reduction. verifier A1 ‘yes’ x1  L1, C(x1) x2 L1, y ‘no’ reduction running in polynomial time verifier A1 f(x3) L1,C(f(x3)) ‘yes’ x3 L2, C(x3) f(x4)  L1, y ‘no’ x4 L2, y

26. Harder problems (3/3) • Under these assumptions, verifier A1 for L1 is able to say ‘yes’ or ‘no’ correctly for any instance of L2. • We say L1 is (polynomial-time) reducible to L2. • We denote this by L1 L2 • L2 then has to be harder than or as hard as L1 if we can construct this reduction. • When a polynomial-time algorithm for L1 is available, the algorithm also provides a solution in polynomial time for any instance of L2. verifier A2 verifier A1 f(x3) L1, C(f(x3)) ‘‘yes’’ x3 L2, C(x3) f(x4)  L1, y x4 L2, y ‘‘no’’

27. Cook-Levin Theorem • [Theorem] Any decision problem Q in NP is reducible to SAT. • SAT is one of the hardest problems in NP. • Such a problem is called a NP-complete problem. f(x1) SAT, C(f(x1)) x1 Q1, C(x1) verifier A f(x2)  SAT, y x2 Q1, y ‘yes’ f’(x3) SAT, C(f(x3)) x3 Q2, C(x3) ‘no’ f’(x4)  SAT, y x4 Q2, y

28. Good property on reductions • Reduction can contain multiple transformations. verifier A2 ‘yes’ x3 L2, C(x3) x4 L2, y ‘no’ verifier A1 f(x3) L1,C(f(x1)) ‘yes’ x3 L2, C(x3) f(x4)  L1, y ‘no’ x4 L2, y verifier A3 ‘yes’ x3 L2, C(x3) ‘no’ x4 L2, y

29. NP-complete • A problem L in NP is NP-complete • if Q is reducible to L for any problem Q in NP, • if SAT is reducible to L, • since QSATL for any Q in NP, • or if an NP-complete problem L’ is reducible to L. • since QL’L for any Q in NP, • SAT is reducible to other problems in NP. • 3-SAT, • Clique, • 3-Color, • Hamilton path problem, • Traveling salesman problem, … • These problems are also the most intractable problems in NP. Clique Indep. set SAT 3-SAT 3-Color Vertex Cover HamPath TSP

30. How to show that a problem L is NP-complete • It consists of two steps: • A decision problem L is in NP. • There is a reduction from an NP-complete problem Q to L. • L is (as hard as or) harder than Q. • From the definition of NP-complete, for any problem Q’ in NP, there is a reduction from Q’ to L. • For an optimization problem Max(Min)-L, we can say Max(Min)-L is NP-hard • if there is a reduction from an NP-complete problem Q to L.

31. Example of reductions (1/9) • We will see that TSP is reducible to SSP. • SSP is as hard as or harder than TSP. • SSP is NP-complete since TSP is NP-complete and TSPSSP • Let x be an instance of TSP, where threshold = n. • Let f(x) be a transformed instance of SSP, where threshold = 3n + 2m + 1. x f(x) (SSP) (TSP) a#A b#B c#C d#D … n+m strings threshold: 3n+2m+1 optimal cost: 3n+2m+k+1 a b n vertices m edges with cost 1 threshold: n optimal cost: n+k f c d

32. Example of reductions (2/9) • Reduction from TSP to SSP • Input x of TSP • Graph with costs between two nodes (arc 1, without arc: 2) • Input f(x) of SSP • Created from the input x of TSP. nodes arcs with cost 1 strings a b c d e ab ac ae cd ce a CdCe CeCd a#A b#B c#C d#D e#E AbAc AcAe AeAb db de b c DbDe DeDb BaBc BcBa ba bc eb ec EbEc EcEb d e

33. Example of reductions (3/9) • x  TSP f(x) SSP • TSP • the optimal cost is 5 with the tour (aecdba). • n=5, m=11, k=0. • SSP • the shortest superstring is 38 long. • 3n + 2m + k + 1 = 35+211+0+1=38. 20 30 10 a a#AeAbAcAe#EcEbEc#CdCeCd#DbDeDb#BaBcBa b#B d#D c#C e#E a#A DbDe DeDb BaBc BcBa CdCe CeCd AbAc AcAe AeAb EbEc EcEb b c d e

34. BcBaBc BcBa BaBc CeCd#D CeCd d#D Example of reductions (4/9) • x  TSP f(x) SSP • Distance graph • A weight on an arc is # characters of a prefix before a match. • thin line = cost 2, thick line = cost 3, no line = more than 3. a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb

35. b#BaBc b#B BaBc b#BcBa b#B BcBa Example of reductions (5/9) • x  TSP f(x) SSP • Distance graph with cost-2 arcs a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb

36. Example of reductions (6/9) • x  TSP f(x) SSP • Distance graph with cost 2 arcs • The sum of costs of arcs: 2m. a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb b#BaBcBa b#B BaBc BcBa

37. Example of reductions (7/9) • x  TSP f(x) SSP • Distance graph with cost 2 arcs • 3n + 2m + k + 1 = 35+211+0+1=38. • Tour aecdba a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb

38. Example of reductions (8/9) • x TSP f(x)SSP • TSP • the optimal cost is 6 with the tour (aecdba). • n=5, m=11, k=1. • SSP • the shortest superstring is 37 long, where the threshold is 36. • 3n + 2m + k + 1 = 35+210+1+1=37. arcs nodes cd ce strings a a b c d e CdCe CeCd a#A b#B c#C d#D e#E ab ac AbAc AcAb db de b c DbDe DeDb BaBc BcBa ba bc eb ec EbEc EcEb d e

39. Example of reductions (9/9) • x TSP f(x)SSP • Distance graph • a–ecdba • Additional cost from an edge between and “AbAc” to “e#E”. a#A b#B c#C d#D e#E AbAc AcAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb

40. Results on approximation • Min-SSP is MAX SNP-hard [Blum 94], • that is, there is no polynomial time algorithm for Min-SSP that finds approximate solution with arbitrary error ratio if P  NP [Arora 98]. • It is hard to efficiently find an arbitrary approximate solution for a given instance of Min-SSP. • On the other hand, several constant-factor (4-, 3-, or 2.5-) approximation algorithms have been developed.

41. Summary • NP-complete problems is the most intractable decision problems in NP. • No one knows any polynomial-time algorithm that finds a solution of an NP-complete problem. • A decision problem L is NP-complete if • L is in NP and • there is a polynomial-time reduction from Q to L, where Q is an NP-complete problem. • A optimization problem Max-(Min-)L is NP-hard if • there is a polynomial-time reduction from Q to L, where Q is an NP-complete problem.

42. Reference (1/2) • Issues on the computational complexity theory • Textbooks • M.R. Garey and D.S. Johnson (1979): Computers and Intractability: a guide to the theory of NP-completeness, W. H. Freeman. • O. Watanabe (1992): Introduction to computability and complexity theory, Kindai-Kagaku-sha (in Japanese). • M. Sipser (1996): Introduction to the theory of computation, PWS Publishing company. • M. T. Goodrich and R. Tamassia (2002): Algorithm Design: Foundations, Analysis, and Internet Examples, John Wiley and Sons, Inc. • Slides of ‘NP-completeness’ (http://www.algorithmdesign.net/handouts/NPComplete.pdf) • Article • A. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy (1998): “Proof verification and the hardness of approximation problems”, Journal of the ACM, 45(3), pp. 501 – 555.

43. Reference (2/2) • Shortest superstring problem • Textbook • D. Gusfield (1997): ‘‘Algorithms on strings, trees, and sequences: computer science and computational biology’’, Chapter 16, Cambridge University Press. • Article • A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis (1994): “Linear approximation of shortest superstring”, Journal of the ACM,41(4), pp. 630 – 647.