Créer une présentation
Télécharger la présentation

Télécharger la présentation
## Topics in Algorithms

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Topics in Algorithms**Introduction to Computational Complexity Theory**Quiz**• A set S of strings is given as below. • Find the shortest strings s (called superstring) of S that contains every element of S as a substring. • This quiz mimics DNA sequencing. (example) [Quiz] S={ate, half, lethal, alpha, alfalfa} S={TCTCTA,CAGTCT,CTCCAAA, GGCAA,TAAGCTCC,TTCTCTC, TCCAAATTCTA,CTTTCT,AACACCTT, CTCCGACC,TTCTATC,TCTATCTC, CTCTGTAACA, CAACAG} s’= atehalflethalphalfalfa s = lethalphalfalfate ate half lethal alpha alfalfa This example is from [Blum 94].**Issues in Computational complexity theory**• Showing upper/lower bounds of computational resources required for solving a problem L. • Upper/lower bounds are described as functions of the length of an input. • Such bounds for • time, • (memory) space, • … • Structural complexity • among classes of problems • Example) P NP E EXP, P E**This talk’s main issues (1/2)**• How to deal with hard (time-consuming) problems • What to do when we find a problem that looks hard. • Sometimes, we could not find any efficient (polynomial-time) algorithm to solve the problem. • (1) If the problem is not hard, someone can find it. • (2) If the problem is really hard, other smart people cannot find it either. (1) (2)**This talk’s main issues (2/2)**• The previous quiz looks intractable to solve. • # possible solutions is 14!=14 13・・1=87,178,291,200. • However, it is not easy to say the problem is hard. • It is hard to find a needle in a haystack. • needle = efficient algorithm • It seems harder to say that there is no needle in a haystack. • You just might miss a needle in the haystack. No needle? Computational complexity theory provides an answer.**Key idea**• We have two problems A and B. • Given input x, we would like to know if xA (xB). • Suppose A is efficiently transformed with f into B • such that aA iff f(a) B. • a: input of A, f: transformation (reduction), f(a): input of B. • This shows that B is harder than (or as hard as) A. • A is solvable if there is a way to solve B. x1 B ‘yes’ x2 B ‘no’ x3 A f(x3) B ‘yes’ x4 A f(x4) B ‘no’ algorithm B1**Overview**• Intuitive explanation of hard (time-consuming) problems • Decision problems/Optimization problems • Polynomial time • Class P, Class NP • Reductions • NP-complete and NP-hard • Examples • Superstring problem • Reduction from Traveling salesman problem**Types of problems (1/2)**• Computational problems roughly fall into two categories: • Decision problem (output: yes/no), • Optimization problem (output: solution with max./min. cost). • Decision problem L • input: • string x • output: • ‘yes’ if xL, • ‘no’ otherwise. • Example) L: positive odd numbers. • L={1, 3, 5,…} • x=3 ‘yes’ since xL, • x=4 ‘no’ since xL.**Types of problems (2/2)**• Computational problems fall into two categories: • Decision problem (output: yes/no), • Optimization problem (output: solution with max./min. cost). • Optimization problem M • input: • string x • cost function f • output: • y such that f(y) is the maximum (or the minimum) • Example) maximize f(x,y)= 2x2y–xy2+3. • x=1 y=1, f(1,1)=4.**Examples of problems (1/6)**• Euler cycle problem (ECP) • Decision problem • Input (instance): • A undirected graph G=(V,E). • Output: • ‘yes’ if there is a graph cycle which uses each edge in G exactly once, • ‘no’ otherwise. ‘yes’ ‘no’**Examples of problems (2/6)**• Shortest superstring problem (SSP) • Decision problem • Input (instance): • A set of sequences S={s1, …sn} and an integer (threshold) l. • Output: • ‘yes’ if there is a string s such that, for all i, si is a substring of s and the length of s is at most l. • ‘no’ otherwise. ‘yes’ since this string contains any sequences and its length is less than 18. s1 = TACGA s2 = ACCC s3 = CTAAAG s4 = GAGC length: 18 TACGACCCTAAAGAGC TACGA ACCC CTAAAG GAGC length: 10 ‘no’**Examples of problems (3/6)**• Shortest superstring problem (Min-SSP) • Optimization problem • Input (instance): • A set of sequences S={s1, …sn}. • Output: • The shortest string s such that, for all i, si is a substring of s. s1 = TACGA s2 = ACCC s3 = CTAAAG s4 = GAGC TACGACCCTAAAGAGC**Examples of problems (4/6)**• Traveling salesman problem (TSP) • Decision problem • Input (instance): • n cities (nodes) with the cost of travel between each pair of them, and an integer (threshold) t. • Output: • ‘yes’ if there is a tour of visiting all the cities and returning to your starting point with cost at most t, • ‘no’ otherwise. 4 b a max. cost: 14 ‘yes’ since the cost of this tour is less than 14. 5 4 4 2 3 3 2 3 a b d c a d c ‘no’ max. cost: 10 3**Examples of problems (5/6)**• Traveling salesman problem (Min-TSP) • Optimization problem • Input (instance): • n cities (nodes) with the cost of travel between each pair of them. • Output: • A tour of visiting all the cities and returning to your staring point with the smallest cost. 4 b a 4 2 3 3 5 4 2 a b d c a 3 d c 3**Examples of problems (6/6)**• Satisfiability problem (SAT) • Decision problem • Input (instance): • ABoolean function f over variables x1,…,xn. • Each takes either true (1) or false (0). • Output: • ‘yes’ if there is a truth assignment of x1,…,xn that satisfies f. • ‘no’ otherwise. ‘yes’ since f = T (1) where f = x1 (x1x2 x3 ) (x1x2x3 x4 ) (x2 x3 x4) (x1x3) x1 = F (0), x2 = T (1), x3 = F (0), x4 = F (0).**Polynomial time**• To simplify the notion of ‘hardness’, we use polynomial-time as the cut-off for efficiency. • polynomial p(n) • Function for some k 1 and ak,…,a0 : • p(n)=aknk+ ak – 1nk –1+・・・+・・・+a0 . • Key property of polynomials • Let p(n) + q(n) be polynomials. • The sum p(n) + q(n) is also polynomial. • A composite function q(p(n)) is also polynomial of n.**Turing machine**• An abstract model of computers. • At each step, • based on • its current state and • the symbol indicated by the header, • the Turing machine changes • its internal state, • the symbol indicated by the header, and • a position of the header. B B 1 0 0 1 1 B B header s1 one step B B 1 1 0 1 1 B B header s2**Hierarchy in the Computational Theory**Halting problem of Turing machines undecidable EXP decidable 2n intractable= exponential time Traveling salesman NP graph isomorphism tractable= polynomial time P nlogn sorting n: input size median n Based on a figure in http://www-imai.is.s.u-tokyo.ac.jp/~imai/lecture/quantum_complexity.pdf**Well-known classes of decision problems**• P: a set of decision problems solvable by a deterministic Turing machine in polynomial time. • ECP P. • NP: a set of decision problems solvable by a non-deterministic Turing machine in polynomial time. • ECP, TSP, SSP, SAT NP. NP P**Example of class NP**• TSP NP since • TSP is solvable in polynomial time by a non-deterministic Turing machine. • At each branch, one node is chosen non-deterministically. • We suppose that it is possible to select the best choice at each branch with the non-deterministic Turing machine. a Time b c d threshold: 14 c d d b c b 4 a b 5 b d c b c d 4 2 3 a a a a a a c d 3 16 16 12 14 12 16**a**b d c a certificate Alternate definition of class NP • TSP NP since • TSP is a decision problem defined with a verifier A(x, y) over strings such that • a string y is with length smaller than |x|c where c is a constant, • A(x,y) is computable by a deterministic Turing machine in polynomial time of |x|+|y|. • A(x,y) is also computable by a deterministic Turing machine in polynomial time of |x|. • Such y is usually called a certificate for x. 4 verifier A(x, y) running in polynomial time a b threshold: 14 5 4 ‘yes’ 2 3 c d 3**Features of problems in NP (1/2)**• The number of possible solutions grows exponentially with the size of inputs. • Example) SSP • Threshold: 12 S={half, alpha, alfalfa} halfalphalfalfa alphalfalfa alfalfahalfalpha half alpha alfalfa half alpha alfalfa half alpha alfalfa halfalfalpha alphalfalfahalf alfalfalphalf half alpha alfalfa half alpha alfalfa half alpha alfalfa**Features of problems in NP (2/2)**• We can verify any instance in polynomial time where we have its certificate (a superstring). • Example) SSP • Threshold: 12 S={half, alpha, alfalfa} alphalfalfa half alpha alfalfa**Harder problems (1/3)**• Suppose that • problems L1 and L2 are in NP. • C(x) denotes a certificate for x. verifier A1 ‘yes’ x1 L1, C(x1) x2 L1, y ‘no’ verifier A2 x3 L2, C(x3) ‘yes’ x4 L2, y ‘no’**Harder problems (2/3)**• Suppose that • problems L1 and L2 are in NP, • C(x) denotes a certificate for x, • we construct this transformation called a reduction. verifier A1 ‘yes’ x1 L1, C(x1) x2 L1, y ‘no’ reduction running in polynomial time verifier A1 f(x3) L1,C(f(x3)) ‘yes’ x3 L2, C(x3) f(x4) L1, y ‘no’ x4 L2, y**Harder problems (3/3)**• Under these assumptions, verifier A1 for L1 is able to say ‘yes’ or ‘no’ correctly for any instance of L2. • We say L1 is (polynomial-time) reducible to L2. • We denote this by L1 L2 • L2 then has to be harder than or as hard as L1 if we can construct this reduction. • When a polynomial-time algorithm for L1 is available, the algorithm also provides a solution in polynomial time for any instance of L2. verifier A2 verifier A1 f(x3) L1, C(f(x3)) ‘‘yes’’ x3 L2, C(x3) f(x4) L1, y x4 L2, y ‘‘no’’**Cook-Levin Theorem**• [Theorem] Any decision problem Q in NP is reducible to SAT. • SAT is one of the hardest problems in NP. • Such a problem is called a NP-complete problem. f(x1) SAT, C(f(x1)) x1 Q1, C(x1) verifier A f(x2) SAT, y x2 Q1, y ‘yes’ f’(x3) SAT, C(f(x3)) x3 Q2, C(x3) ‘no’ f’(x4) SAT, y x4 Q2, y**Good property on reductions**• Reduction can contain multiple transformations. verifier A2 ‘yes’ x3 L2, C(x3) x4 L2, y ‘no’ verifier A1 f(x3) L1,C(f(x1)) ‘yes’ x3 L2, C(x3) f(x4) L1, y ‘no’ x4 L2, y verifier A3 ‘yes’ x3 L2, C(x3) ‘no’ x4 L2, y**NP-complete**• A problem L in NP is NP-complete • if Q is reducible to L for any problem Q in NP, • if SAT is reducible to L, • since QSATL for any Q in NP, • or if an NP-complete problem L’ is reducible to L. • since QL’L for any Q in NP, • SAT is reducible to other problems in NP. • 3-SAT, • Clique, • 3-Color, • Hamilton path problem, • Traveling salesman problem, … • These problems are also the most intractable problems in NP. Clique Indep. set SAT 3-SAT 3-Color Vertex Cover HamPath TSP**How to show that a problem L is NP-complete**• It consists of two steps: • A decision problem L is in NP. • There is a reduction from an NP-complete problem Q to L. • L is (as hard as or) harder than Q. • From the definition of NP-complete, for any problem Q’ in NP, there is a reduction from Q’ to L. • For an optimization problem Max(Min)-L, we can say Max(Min)-L is NP-hard • if there is a reduction from an NP-complete problem Q to L.**Example of reductions (1/9)**• We will see that TSP is reducible to SSP. • SSP is as hard as or harder than TSP. • SSP is NP-complete since TSP is NP-complete and TSPSSP • Let x be an instance of TSP, where threshold = n. • Let f(x) be a transformed instance of SSP, where threshold = 3n + 2m + 1. x f(x) (SSP) (TSP) a#A b#B c#C d#D … n+m strings threshold: 3n+2m+1 optimal cost: 3n+2m+k+1 a b n vertices m edges with cost 1 threshold: n optimal cost: n+k f c d**Example of reductions (2/9)**• Reduction from TSP to SSP • Input x of TSP • Graph with costs between two nodes (arc 1, without arc: 2) • Input f(x) of SSP • Created from the input x of TSP. nodes arcs with cost 1 strings a b c d e ab ac ae cd ce a CdCe CeCd a#A b#B c#C d#D e#E AbAc AcAe AeAb db de b c DbDe DeDb BaBc BcBa ba bc eb ec EbEc EcEb d e**Example of reductions (3/9)**• x TSP f(x) SSP • TSP • the optimal cost is 5 with the tour (aecdba). • n=5, m=11, k=0. • SSP • the shortest superstring is 38 long. • 3n + 2m + k + 1 = 35+211+0+1=38. 20 30 10 a a#AeAbAcAe#EcEbEc#CdCeCd#DbDeDb#BaBcBa b#B d#D c#C e#E a#A DbDe DeDb BaBc BcBa CdCe CeCd AbAc AcAe AeAb EbEc EcEb b c d e**BcBaBc**BcBa BaBc CeCd#D CeCd d#D Example of reductions (4/9) • x TSP f(x) SSP • Distance graph • A weight on an arc is # characters of a prefix before a match. • thin line = cost 2, thick line = cost 3, no line = more than 3. a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb**b#BaBc**b#B BaBc b#BcBa b#B BcBa Example of reductions (5/9) • x TSP f(x) SSP • Distance graph with cost-2 arcs a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb**Example of reductions (6/9)**• x TSP f(x) SSP • Distance graph with cost 2 arcs • The sum of costs of arcs: 2m. a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb b#BaBcBa b#B BaBc BcBa**Example of reductions (7/9)**• x TSP f(x) SSP • Distance graph with cost 2 arcs • 3n + 2m + k + 1 = 35+211+0+1=38. • Tour aecdba a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb**Example of reductions (8/9)**• x TSP f(x)SSP • TSP • the optimal cost is 6 with the tour (aecdba). • n=5, m=11, k=1. • SSP • the shortest superstring is 37 long, where the threshold is 36. • 3n + 2m + k + 1 = 35+210+1+1=37. arcs nodes cd ce strings a a b c d e CdCe CeCd a#A b#B c#C d#D e#E ab ac AbAc AcAb db de b c DbDe DeDb BaBc BcBa ba bc eb ec EbEc EcEb d e**Example of reductions (9/9)**• x TSP f(x)SSP • Distance graph • a–ecdba • Additional cost from an edge between and “AbAc” to “e#E”. a#A b#B c#C d#D e#E AbAc AcAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb**Results on approximation**• Min-SSP is MAX SNP-hard [Blum 94], • that is, there is no polynomial time algorithm for Min-SSP that finds approximate solution with arbitrary error ratio if P NP [Arora 98]. • It is hard to efficiently find an arbitrary approximate solution for a given instance of Min-SSP. • On the other hand, several constant-factor (4-, 3-, or 2.5-) approximation algorithms have been developed.**Summary**• NP-complete problems is the most intractable decision problems in NP. • No one knows any polynomial-time algorithm that finds a solution of an NP-complete problem. • A decision problem L is NP-complete if • L is in NP and • there is a polynomial-time reduction from Q to L, where Q is an NP-complete problem. • A optimization problem Max-(Min-)L is NP-hard if • there is a polynomial-time reduction from Q to L, where Q is an NP-complete problem.**Reference (1/2)**• Issues on the computational complexity theory • Textbooks • M.R. Garey and D.S. Johnson (1979): Computers and Intractability: a guide to the theory of NP-completeness, W. H. Freeman. • O. Watanabe (1992): Introduction to computability and complexity theory, Kindai-Kagaku-sha (in Japanese). • M. Sipser (1996): Introduction to the theory of computation, PWS Publishing company. • M. T. Goodrich and R. Tamassia (2002): Algorithm Design: Foundations, Analysis, and Internet Examples, John Wiley and Sons, Inc. • Slides of ‘NP-completeness’ (http://www.algorithmdesign.net/handouts/NPComplete.pdf) • Article • A. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy (1998): “Proof verification and the hardness of approximation problems”, Journal of the ACM, 45(3), pp. 501 – 555.**Reference (2/2)**• Shortest superstring problem • Textbook • D. Gusfield (1997): ‘‘Algorithms on strings, trees, and sequences: computer science and computational biology’’, Chapter 16, Cambridge University Press. • Article • A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis (1994): “Linear approximation of shortest superstring”, Journal of the ACM,41(4), pp. 630 – 647.