Complexity Theory in Practice

Complexity Theory in Practice The Power of Randomization Speaker: He Yan November 17, 2008

Introduction (Traditional Algorithms) Traditional algorithms have the following properties: • They are always correct • They are always precise-- the answer is not given as a range • They are deterministic-- although there maybe multiple correct outputs, the same instance of a problem always produces the same output • Each of them operates on the same efficiency for the same instance of a problem

Introduction (Randomized Algorithms) Randomized algorithms: the class of algorithms which employs a degree of randomness as part of its logic. They can be: • Nondeterministic-- they can make random but correct decisions which means the same algorithm may behave differently when is applied twice to the same instance of a problem. • Not very precise always-- usually the more time is given, the better precision can be guaranteed. • Although randomized algorithms’ behavior is unpredictable for a single execution, we can get a probabilistic characterization of its behavior over a number of runs with different efficiencies. • Incorrect sometimes and nonterminating sometimes: expect a high probability of being correct; do not produce an answer at all. Because: A randomized algorithm makes random choices during the execution.

Why do we need randomized algorithms • If an algorithm is confronted by a choice, it may be preferable to randomly choose a course of action rather than spending time on figuring out which alternative is the best. • Sometimes, we do not have a better method than making random choices. • One advantage is that if more than one correct answer exist, several different ones may be obtained by running the probabilistic algorithm more than once. Comparison of the factors influencing the performance between two • Randomized algorithm’s performance will be a random variable determined by the random bits (algorithm itself) but not by data. • Deterministic algorithm’s performance depends on data as well as on the algorithm- it’s data that induces a probability distribution.

Expected V.S. Average time A Sorting Algorithm 1, 2, 3 T123 1, 3, 2 T132 2, 1, 3 T213 2, 3, 1 T231 3, 1, 2 T312 3, 2, 1 +T321 6 The average time of a deterministic algorithm is computed by considering each possible instance of a given size equally likely. E.g. sorting 3 integers

A Sorting Algorithm Compute the mean T1 T2 1, 2, 3 T3 T4 T5 T6 The expected time of a probabilistic algorithm is defined on each individual instance. It’s the mean time required to solve the same instance again and again.

Pseudorandom Number Generations (pseudorandom) • In the randomized algorithms, we assume the availability of a random number generator that can be called at unit cost. • We assume that a call on uniform(i, j) returns an integer x that is chosen randomly in the interval i ≤ x ≤ j. • We assume that the distribution of x is uniform on the interval and successive calls on the generator yield independent values of x. • However truly random generators are not usually available in practice. Most of the time pseudorandom

Pseudorandom Number Generators (PRNG) • An algorithm for generating a sequence of numbers that approximates the properties of random numbers. The sequence is not really random since it is completely decided by a set of initial values called PRNG’s state. NOTE: • A sequence of calls to a pseudorandom generator will produce values that appear to have the properties of a random sequence. • A pseudorandom generator will need a seed. The same seed will produce the same sequence of values. • A sequence generated by a pseudorandom generator is periodic with the period not exceeding the number of values between i and j for uniform(i, j). • A good pseudorandom generator can generate a sequence that for most practical purposes is indistinguishable from a truly random sequence.

Common classes of PRNG Linear congruential generators: where Xn is the sequence of random values 0 < m: the "modulus" 0< a < m: the "multiplier" 0≤ c < m: the "increment" 0 ≤ X0 < m: the "seed" or "start value" Lagged Fibonacci generators: m is usually a power of 2 Additive: uses addition as the operator Multiplicative: uses multiplication as the operator Integer constants which specify the generator ★ ★

Simple example: Quicksort • Quicksort is a familiar, commonly-used algorithm in which randomness can be useful • Best-case running time: O(nlogn) • Worst-case running time: O(n2) • Average-case running time: O(nlogn) • If we choose the (first) element as the pivot, the complexity depends on the input data: data-dependant distribution • However, Randomized quicksort (randomly select the pivot) requires O(nlogn) expected time regardless of the input since the worst case (O(n2 ) ) won’t be triggered repeatedly by the same input elements

Randomized algorithms have been largely used to speed up existing solutions to tractable problems as well as providing approximate solutions for hard problems. • It applies to a decision problem returns “yes” or “no” with a probabilistic guarantee to ensure the correctness of the answer (at least this probability can be improved to any desired level)

Here we still focus on decision problems, however, REMEMBER: randomized algorithms are also used to provide approximate solutions for optimization problems. • Go finding two famous randomized algorithms now!

Monte Carlo algorithm (MC) • MC methods are a class of computational algorithms that rely on repeated random sampling to compute their results • A random MC algorithm runs a polynomial time but may provide error with probability less than some constant (say ½) • A one-sided MC decision algorithms never errs when it returns one type of answer (say “no”) and errs with probability less than some constant (say ½) when returning “yes” • The most interesting feature of MC algorithm is that it is often possible to reduce the error probability arbitrarily at the cost of a slight increase in computing time. This is called amplifying the stochastic advantage.

MC example • Monte Carlo Pi We want to approximateπusing MC methods The area of the circle:π = π = π The area of the square is = = 4 The ratio of the area of the circle to the area of the square is: p = == = = 0.78539815 If we know the ratio, then we could multiple it by four to obtain π One simple way: pick lattice points in the square and count how many of them lie inside the circle: 812 points inside & 212 outside the circle p= 812/(812+212)= 0.79296975. Then area of circle ≈p * Area of circle = p * 4 = 0.79296975 * 4= 3.171875 1 Area of Circle 3.1415926 π π Area of Square 4 4

MC method for π • Randomly selecting points in the unit square and determining the ratio p= , m is the number satisfying • Sample size:1000, 787 points satis -fying , so p= = and π≈0.784*4=3.148 m n m 787 n 1000

An example of a one-sided MC algorithms Example 8.3 (P336) • Given a Boolean function , we can construct a binary decision tree for it. In a binary decision tree, each internal node represents a variable of the function and has two children. One represents “true” variable while the other represents “false” for the variable. Each leaf is labeled “true” or “false”, representing the value of the function for the truth assignment denoted by the path from the root to leaf. A binary decision tree example (Fig. 8.10 P336). One fundamental question is whether or not two trees represent the same Boolean function This problem belongs to coNP: if two trees represent different functions, then there is at least one truth assignment under which the two functions will return different values, so that we can guess this truth

assignment and verify that the two binary decision trees return distinct values. However, no deterministic polynomial algorithm has been found for such problem, and nobody proves it coNP-complete. So in order not to guess a truth assignment to the n variables and compute a Boolean value, we can use a random assignment of integers in the range S= [0, 2n-1], and compute (module p, where p is a prime no smaller than |S|) an integer as characteristic of the entire tree under this assignment. If x is assigned value i, then we assign 1-i (module p) to its complement, so that the sum of the value of x and of x is 1. For each leaf of the tree labeled “true”, compute the product of the values of the variables encountered along the path; then sum all the values. Compare the two resulting numbers (one for each tree) then. If differ, it means our algorithm concludes that the tree represent different functions,

Otherwise it concludes that they represent the same function. The algorithms gives the correct answer whenever two values differ but may err when two values are equal. We claim that at least (|S| -1) of the possible (|S|) assignments of values to the n variables will yield distinct values when the two functions are distinct; this claim strongly implies that the probability of error is bounded by (|S| - 1) |S| and that we have a one-sided Monte Carlo algorithm for this problem. The claim trivially holds for functions of one variable; assume that it holds for functions of n or fewer variables and consider two distinct functions f and g so that we have f= fx=0 + x fx=1. If f and g differ, then fx=0 and g x=0 differ, or fx=1 and g x=1 differ, or both. In order to have the value computed n n n n (2n-1) >1/2 = n n (2n)

for f equal that computed for g, we should have: (1-|x|) | fx=0 | + |x|| fx=1 | = (1-|x|) | g x=0 |+ |x| | g x=1 | (where we denote the value assigned to x by |x| and for f by | f |). If | f x=0| and | g x=0 | differ, we can write: |x| (| fx=1 | - | fx=0 | - | g x=1 |+ | g x=0 | = | fx=0 | - | gx=0 | which has at most one solution for |x| since the right-hand side is nonzero. Thus we have at least (|S|-1) assignments for x that maintain the difference in values for f and g given a difference in values for | fx=0 | and | gx=0 |---- because the latter can be obtained by at least (|S|-1) assignments so that at least (|S|-1) assignments will lead to different values whenever f and g differ which is our wanted result! n n+1

Las Vegas Algorithm • In computing, a Las Vegas algorithm is a randomized algorithm that never gives incorrect results; that is, it always produces the correct result or it informs about the failure • Because of its nondeterministic nature, the run-time of a Las Vegas algorithm is a random variable. It runs in polynomial time on average assuming that all instances of size n are equally likely and running time on instance x is f(x), the expression ∑ 2 f(x), where the sum is taken over all x of size n, bounded by polynomial in n • Las Vegas algorithms are used to solve some of NP Complete problems, Genetic algorithms, Evolution Strategies, Ant Colony Optimization and etc. The time to solve these is random, which means the duration shouldn’t be very big/long so another application of it is Cryptography application, generation of very long prime numbers and etc. • Example: Randomized quicksort discussed before, where the pivot is chosen randomly, but at the end we will always get sorted data -n x

Comparison • Las Vegas Algorithm and Monte Carlo Algorithm are Randomized algorithms • Both are one-sided decision algorithms (either “yes” or “no” instance) • A random MC algorithm runs a polynomial time but may provide error with probability less than some constant (say ½) • Unlike MC methods, a Las Vegas algorithm never returns a wrong answer but may not run in polynomial time for all instances. It does not gamble with the truth of the result --- it only gambles with the resources used for the computation • For MC method, given a “no” instance, all of leaves of the computation tree are “no” leaves and for “yes” instance, at least half of the leaves of the computation tree are “yes” leaves • For Las Vegas, given a “no” instance, the computation tree has only “no” instance, whereas given “yes” instance, it has at least one “yes” leaf

We attempt to solve a problem in NP by using a randomized method i.e. by producing & verifying a random certificate. • If the answer returned by the algorithm is “yes” then the probability of error is 0; otherwise, if the answer is “no”, then the probability or error will be large. • Particularly, there are 2 possible certificates and only one of them may result in acceptance, so that the probability of error is bounded by (1-2 ) times the probability that instance x is a “yes” instance. • Since the bound is relied on the input size, we cannot achieve a fixed probability of error by using a fixed number of trials. • Generally speaking, we can conclude that a nondeterministic algorithm is a generalization of a Monte Carlo algorithm (both are one-sided) with the latter itself a generalization of a Las Vegas algorithm. • We have a model of computation called random Turing machine which is similar to a nondeterministic machine in that it has a choice of moves at every step and thus make decisions. However, unlike its nondeterministic version, it makes decision by tossing a fair coin. |x| -|x|

A RTM operates in polynomial time if the height of its computation tree is bounded by a polynomial function of the instance size. • If the computation is terminated after a polynomial number of moves, then the machine will be prevented from reaching a conclusion. • Leaves of a polynomial bounded computation tree are marked by one of “yes”, “no”, or “don’t know”. • Definition 8.15 (P339) PP is the class of all decision problems for which there exists a polynomial-time random TM such that, for any instance x of ∏: - if x is a “yes” instance, then the machine accepts x with probability larger than 1/2 ; - if x is a “no” instance, then the machine rejects x with probability larger than 1/2.

BPP is the class of all decision problems for which there exists a polynomial time random Turing machine (PTRTM) and a positive constant ε≤ 1/2 such that, for any instance x of ∏: -- if x is a “yes” instance, then the machine accepts x with probability no less than 1/2 +ε; --if x is a “no” instance, then the machine rejects x with probability larger than 1/2 +ε; (“B” indicates the probability is bounded away from 1/2.) • RP is the class of all decision problems for which there exists a polynomial time random Turing machine (PTRTM) and a positive constant ε≤ 1 such that, for any instance x of ∏: --if x is a “yes” instance, then the machine accepts x with probability no less thanε; --if x is a “no” instance, then the machine rejects x

RP is a one-sided class whose complementary class can be defined as coRP. • The class RP∪coRP represents problems for which one-sided Monte Carlo algorithms exist, whereas RP∩coRP corresponds to problem for which Las Vegas algorithms exist. • Lemma 8.1 (P339) A problem ∏ belongs to RP ∩ coRP iff there exits a PTRTM and a positive constant ε ≤ 1 such that, -- the machine accepts or rejects an arbitrary instance with probability no less than ε -- the machine accepts only “yes” instances and rejects only “no” instances.

This new definition is quite similar to the definition of NP∩coNP: the only change is to make εdependent upon the instance rather than only upon the problem. • We can conclude that RP∩coRP is a subset of NP∩coNP, RP is a subset of NP, coRP is a subset of coNP, and BPP is a subset of PP. • Furthermore, since all computation trees are limited to polynomial height, it’s obvious that all of these classes are contained within PSPACE. • Finally, since no computation tree is required to have all of leaves labeled “yes” for a “yes” instance and labeled “no” for a “no” instance. Additionally, P is contained within all of these classes. • Continuing examining the relationships among these classes, we can find that the εvalue given in the definition of RP could as easily have been specified larger than ½.

Given a machine M with some εno larger than ½, we can construct a machine M’ with an εlarger than ½ by making M’ iterate M for a series of trials- which is the main feature of MC algorithms that the probability of error can be decreased to any fixed value after a fixed-step trials.) Therefore the definition of RP and coRP is just a strengthened (one side only) version of BPP, so that both RP and coRP are within BPP.

Theorem 8.27 (P340) NP (also coNP) is a subset of PP. Proof. We can use a random TM to approximate the nondeterministic machine for a NP problem. Comparing definitions for NP and PP, we find that the only thing to do is to show how to take the nondeterministic machine M for our problem and turn it into a suitable random machine M’. M accepts a “yes” instance with probability larger than 0 but no larger than any fixed constant. We need to make this probability larger than ½, and this can be done through tossing a coin before the starting any computation and accepting the instance a priori if the toss produces heads. This procedure introduces an a priori probability of acceptance called Pa , of ½, therefore the probability of acceptance of “yes” instance x is now at least ½+2 and the probability of rejection of a “no” instance which was exactly 1 without tossing coin will be now 1- Pa=1/2. -p(|x|)

-p(|x|) • The solution is quite straightforward: it’s enough to make Pa less than ½ while still large enough so that Pa +2 >½. Tossing an additional p(|x|) coins will satisfy: M’ accepts a priori exactly when the 1st toss returns head and the next p(|x|) tosses do not all return tails, so that Pa =½ -2 . Hence a “yes” instance is accepted with probability Pa +2 = ½ + 2 and a “no” instance is rejected with probability 1- Pa =½ + 2 . Because M’ runs in polynomial time iff M does, it follows our conclusion. Q.E.D -p(|x|)-1 -p(|x|) -p(|x|)-1 -p(|x|)-1

The hierarchy of randomized complexity classes • Resulting hierarchy of randomized classes and its relation to P, NP, and PSPACE PSPACE PP co-NP BPP NP co-R R NP ∩ co-NP R ∩ co-R P

There is one more class of complexity corresponding to the Las Vegas algorithms (algorithms that always return the correct answer but have a random execution time and a polynomial expectation time). The class of decision problems solvable with this type of algorithms is represented by ZPP: “Z” stands for zero error probability. It is no other than RP ∩ coRP. • Theorem 8.28 (P342) ZP equals RP ∩ coRP. Proof. We prove containment in each direction (ZPP RP ∩ coRP) Given a machine M for a problem in ZPP, we construct a machine M’ that answers the conditions for RP ∩ coRP by simply cutting the execution of M after a polynomial time. This prevents M from returning a result so that resulting machine M’, while running in polynomial time and never returning a wrong answer, will have a small probability of not returning any answer. It remains only to show that this probability is bounded above by some constant <1. Let q() be the polynomial bound on expected running time of M. Define M’ by stopping M on all paths exceeding some polynomial bound r(), where polynomials r() and r’() are chosen such that r(n) + r’(n) = q(n) and such that r() provides the desired .

Without loss of generality, we assume that all computations paths that lead to a leaf within the bound r() do so in exactly r(n) steps. px represents the probability that M’ doesn’t give any answer. On an instance of size n, the expected running time of M will be given by (1- px )﹒r(n) + px ﹒tmax (n), where tmax (n) represents the average number of steps on the paths that need more than polynomial time. By hypothesis, this expectation is bounded by q(n)=r(n) + r’(n). Solving for px, we get px ≤ This equality is always smaller than 1, since the denominator is superpolynomial by assumption. Because we can pick r() and r’(), we can make px smaller than any <0. (RP ∩ coRP ZPP) Given a machine M for a problem in RP ∩ coRP, we construct a machine M’ that answers the conditions for ZPP. Let 1/k (k>1) be the bound on the probability that M doesn’t return an answer, let r() be the polynomial bound on the running time of M, and k be a bound on the time required to solve an instance of size n deterministically. On an instance of size n, M’ simply runs M for up to q(n) trials. As soon as M returns an answer, M’ returns the same answer and stops; on the other hand, if none of the q(n) r’(n) tmax (n) – r(n) q(n)

-q(n) -q(n) -q(n) q(n) -q(n) • successive runs of M returns answer, then M’ will deterministically solve the instance. Since the probability that M doesn’t return any answer in q(n) trials is k , the expected running time of M’ is bounded by (1- k )﹒r(n)+ k ﹒k =1 + (1- k )﹒r(n). Hence the expected running time of M’ is bounded by a polynomial in n. Q.E.D Because all known randomized algorithms are MC algorithms, Las Vegas algorithms, or ZPP algorithms, the problems which we are able to tackle with randomized algorithms appear to be confined to a subset of RP ∪ coRP. Furthermore, as the membership of an NP-complete problem in RP would imply NP= RP, an outcome considered unlikely, it follows that this subset of RP ∪coRP does not include any NP-complete or coNP-complete problem. Therefore, in its current state of development, randomization is far from being a panacea for the hard problems ! The other two classes of randomized complexity: membership in BPP indicates the existence of randomized algorithms that run in polynomial time with an arbitrarily small, fixed probability of error.

Theorem 8.29 (P343) Let ∏ be a problem in BPP. Then, for any >0, there exists a polynomial-time randomized algorithm that accepts “yes” instances and rejects “no” instances of ∏ with probability at least 1- . Proof. Since ∏ is in BPP, it has a polynomial-time randomized algorithm A that accepts “yes” instances and rejects “no” instances of ∏ with probability at least ½ + , for some constant ε >0. Consider the following new algorithm, where k is an odd integer to de defined shortly. yes_count :=0; for i: = 1 to k do if A(x) accepts then yes_count := yes_count +1 if yes_count > k div 2 then accept else reject

k j k - j j • If x is a “yes” instance of ∏, then A(x) accepts with probability at least ½ + ; thus probability of observing exactly j acceptance (and thus k-j rejections) in k runs of A(x) is at least () (1/2 + ) (1/2 - ) We can derive a simplified bound for this value when j does not exceed k/2 by equalizing the two powers to k/2: () (1/2 + ) (1/2 - ) ≤ () (1/4 - ) Summing these probabilities for values of j not exceeding k/2, we an get the probability that our new algorithm will reject a “yes” instance: ∑ () (1/2 + ) (1/2 - ) ≤ (1/4 - ) ∑ () ≤ (1/4 - 4 ) Now we choose k in order to ensure (1-4 ) ≤ that gives us k ≥ 2log / log (1-4 ) so that k is a constant depending only on input constant . k/2 j k - j 2 k k j j k/2 k/2 k/2 k k j 2 k - j j j j=0 j=0 k/2 2 2 2

p p • Theorem 8.30 (P344) BPP is a subset of ∑2 ∩ ∏2(where these two classes are the nondeterministic and co-nondeterministic classes at the second level of the polynomial hierarchy discussed in Section 7.3.2) If NP is not equal to coNP, then neither NP nor coNP is closed under complementation, whereas BPP is of course; thus under standard conjecture, BPP is not equal to NP or coNP. A result that we shall not prove states that adding to a machine for the class BPP an oracle that solves any problem in BPP itself does not increase the power of the machine; in our notation, BPP equals to BPP. By comparison, the same result holds trivially for class P, while it does not appear to hold for NP, since we know that NP is a proper superset of NP. An immediate effect of this effect of this result and of Theorem 8.30 is that: if we had NP BPP, then the entire polynomial hierarchy would collapse into BPP- something that would be so surprising. Hence BPP does not appear to contain any NP-complete problem, so that the BPP NP

scope of randomized algorithms is indeed fairly restricted. Then what about the largest class, PP? Membership in PP is not likely to be of much help, as the probabilistic guarantee on the error bound is so poor. The amount by which the probability exceeds the bound of ½ may depend on the instance of size n. Reducing the probability of error to a small fixed value for such a problem needs an exponential number of trials. PP is quite closely related to #P, the class of enumeration problems corresponding to decision problems in NP. We are clear that a complete problem (under Turing reductions) for #P is “How many satisfying truth assignments exist for a given 3SAT instance?” A similar problem “Do more than ½ of the possible truth assignments satisfy a given 3SAT instance?” is complete for PP (Exercise 8.36). Thus, PP contains the decision version of the problem in #P- but not ask for the number of certificates- the problems is concerned with whether the number of certificates meets a certain bound. As a result, an oracle for PP

PP #P is as good as an oracle for #P, that is P is equal to P Conclusion: Randomized algorithms have the potential for providing efficient and elegant solutions for many problems, as long as said problems are not too hard. Whether or not a randomized algorithm indeed makes a difference remains unknown; the hierarchy of classes described earlier is not firm, since it is only based on usual conjecture that all containments are proper (strict). Randomized algorithms are dependent on the random bits they use. However, in fact these bits are not really random, since they are generated by pseudorandom number generator. In reality, the randomized algorithms that we actually run are completely deterministic for a fixed choice of seed.

Review Classes of randomized (probabilistic) algorithms • 1. Numerical probabilistic algorithms --- give an approximation to the correct answer. • 2. Monte Carlo Algorithms --- always give an answer, but there is a probability of being completely wrong. • 3. Las Vegas Algorithms --- sometimes fail to give an answer, but if an answer is given, it is correct. Classes of randomized (probabilistic) algorithms 1. Numerical probabilistic algorithms --- give an approximation to the correct answer. 2. Monte Carlo Algorithms --- always give an answer, but there is a probability of being completely wrong. 3. LasVegas Algorithms --- sometimes fail to give an

References • B. M. Moret, The Theory of Computation, Chapter 8.4: The Power of Randomization,Addison-Wesley, Reading, Massachusetts, 1998, pp. 335 - 345. • Wikipedia • Gilles Brassard and Paul Bratley, “Randomized Algorithms” in “Fundamentals of Algorithmics.” • http://www.datastructures.info/the-las-vegas-algorithmmethod/

THANKS

Complexity Theory in Practice

Complexity Theory in Practice

Presentation Transcript

Tracing Complexity Theory

Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

Polynomials in Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

Experimental Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory