360 likes | 504 Vues
Uri Zwick Tel Aviv University. Simple Stochastic Games Mean Payoff Games Parity Games. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A. Simple Stochastic Games. Mean Payoff Games. Parity Games. Randomized subexponential algorithm for SSG.
E N D
Uri ZwickTel Aviv University Simple Stochastic GamesMean Payoff GamesParity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA
Simple Stochastic Games Mean Payoff Games Parity Games Randomizedsubexponential algorithm for SSG Deterministicsubexponential algorithm for PG
Simple Stochastic Games Mean Payoff Games Parity Games
R R R R A simple Simple Stochastic Game
min-sink MAX-sink Simple Stochastic game (SSGs)Reachability version[Condon (1992)] R min MAX RAND Two Players: MAX and min Objective:MAX/min the probability of getting to the MAX-sink
Simple Stochastic games (SSGs)Strategies A generalstrategy may be randomized and history dependent A positional strategy is deterministicand history independent Positionalstrategy for MAX: choice of an outgoing edge from each MAX vertex
Simple Stochastic games (SSGs)Values Every vertex i in the game has a valuevi general positional general positional Both players have positionaloptimal strategies There are strategies that are optimal for every starting position
Simple Stochastic game (SSGs)[Condon (1992)] Terminating binary games The outdegrees of all non-sinks are 2 All probabilities are ½. The game terminates with prob. 1 Easy reduction from general gamesto terminating binary games
“Solving” terminating binary SSGs The values vi of the vertices of a game are the unique solution of the following equations: The values are rational numbersrequiring only a linear number of bits Corollary: Decision version in NP co-NP
Value iteration (for binary SSGs) Iterate the operator: Converges to the unique solution But, may require an exponentialnumber of iterations to get close
Simple Stochastic game (SSGs)Payoff version[Shapley (1953)] R min MAX RAND Limiting average version Discounted version
Markov Decision Processes (MDPs) R min MAX RAND Theorem:[Epenoux (1964)] Values and optimal strategies of a MDP can be found by solving an LP
SSG NP co-NP – Another proof Deciding whether the value of a game isat least (at most) v is in NP co-NP To show that value v,guess an optimal strategy for MAX Find an optimal counter-strategy for min by solving the resulting MDP. Is the problem in P ?
Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)] R min MAX RAND Non-terminating version Discounted version ReachabilitySSGs PayoffSSGs MPGs Pseudo-polynomial algorithm (PZ’96)
Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)] Again, both players have optimal positional strategies. Value(σ,) – average of cycle formed
Selecting the second largest element with only four storage locations [PZ’96]
Parity Games (PGs) A simple example Priorities 2 3 2 1 4 1 EVEN wins if largest priorityseen infinitely often is even
8 3 ODD EVEN Parity Games (PGs) EVEN wins if largest priorityseen infinitely often is even Equivalent to many interesting problemsin automata and verification: Non-emptyness of -tree automata modal -calculus model checking
8 3 ODD EVEN Parity Games (PGs) Mean Payoff Games (MPGs) [Stirling (1993)] [Puri (1995)] Replace priority k by payoff (n)k Move payoffs to outgoing edges
Switches …
Strategy/Policy Iteration Start with some strategy σ (of MAX) While there are improving switches, perform some of them As each step is strictly improving and as there is a finite number of strategies, the algorithm must end with an optimal strategy SSG PLS (Polynomial Local Search)
Strategy/Policy IterationComplexity? Performing only one switch at a time may lead to exponentially many improvements,even for MDPs [Condon (1992)] What happens if we perform all profitable switches[Hoffman-Karp (1966)] ??? Not known to be polynomialO(2n/n) [Mansour-Singh (1999)] No non-linear examples2n-O(1) [Madani (2002)]
A randomized subexponential algorithm for simple stochastic games
Arandomizedsubexponentialalgorithm for binary SSGs[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)] Start with an arbitrary strategy for MAX Choose a random vertex iVMAX Find the optimal strategy ’ for MAX in the gamein which the only outgoing edge of i is (i,(i)) If switching ’ at i is not profitable, then ’ is optimal Otherwise, let (’)i and repeat
Arandomizedsubexponentialalgorithm for binary SSGs[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)] MAX vertices All correct ! Would never be switched ! There is a hidden order of MAX vertices under which the optimal strategy returned by the first recursive call correctly fixes the strategy of MAX at vertices 1,2,…,i
The hidden order ui(σ)- the maximum sum of values of a strategy of MAX that agrees with σ on i
The hidden order Order the vertices such that Positions 1,..,iwere switchedand would neverbe switched again
SSGs are LP-type problems[Halman (2002)] General (non-binary) SSGs can be solved in time Independently observed by[Björklund-Sandberg-Vorobyov (2005)] AUSO – Acyclic Unique Sink Orientations
SSGs GPLCP[Gärtner-Rüst (2005)][Björklund-Svensson-Vorobyov (2005)] GPLCPGeneralized Linear ComplementaryProblem with a P-matrix
A deterministic subexponential algorithm for parity games Mike PatersonMarcin JurdzinskiUri Zwick
Parity Games (PGs) A simple example Priorities 2 3 2 1 4 1 EVEN wins if largest priorityseen infinitely often is even
8 3 ODD EVEN Parity Games (PGs) Mean Payoff Games (MPGs) [Stirling (1993)] [Puri (1995)] Replace priority k by payoff (n)k Move payoffs to outgoing edges
Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)] Vertices of highest priority(even) Firstrecursivecall Vertices from whichEVEN can force thegame to enter A Lemma: (i) (ii)
Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)] Second recursivecall In the worst case, both recursive calls are on games of size n1
Deterministic subexponential alg for PGsJurdzinski, Paterson, Z (2006) Idea:Look for small dominions! Second recursivecall Dominions of size s can be found in O(ns) time Dominion Dominion: A (small) set from which one of the players can win without the play ever leaving this set
Open problems • Polynomial algorithms? • Is the Policy Improvement algorithm polynomial? • Faster subexponential algorithmsfor parity games? • Deterministic subexponential algorithmsfor MPGs and SSGs? • Faster pseudo-polynomial algorithmsfor MPGs?