460 likes | 559 Vues
Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds. MDS summer school “The Combinatorics of Linear and Semidefinite Programming” August 14-16, 2012. TexPoint fonts used in EMF.
E N D
Uri Zwick –Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds MDS summer school “The Combinatorics of Linear and Semidefinite Programming” August 14-16, 2012 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA
Deterministic pivoting rules Largest improvement Largest slope Dantzig’s rule – Largest modified cost Bland’s rule – avoids cycling Lexicographic rule – also avoids cycling All known to require an exponential number of steps, in the worst-case Klee-Minty (1972) Jeroslow (1973), Avis-Chvátal (1978), Goldfarb-Sit (1979), … , Amenta-Ziegler (1996)
Klee-Minty cubes (1972) Taken from a paper by Gärtner-Henk-Ziegler
Randomized pivoting rules Random-Edge Choose a random improving edge Random-Facet Described in previous lecture ☺ [Kalai (1992)] [Matoušek-Sharir-Welzl (1996)] Random-Facetis sub-exponential! Are Random-Edge and Random-Facet polynomial ???
Abstract objective functions (AOFs) Acyclic Unique Sink Orientations (AUSOs) Every face shouldhave a unique sink
AUSOs of n-cubes 2n facets2n vertices USOs and AUSOs Stickney, Watson (1978) Morris (2001) Szabó, Welzl (2001) Gärtner (2002) The directeddiameter is exactly n Exercise: Prove it.
AUSO results Random-Facet is sub-exponential[Kalai (1992)] [Matoušek-Sharir-Welzl (1996)] Sub-exponential lower bound for Random-Facet [Matoušek (1994)] Sub-exponential lower boundfor Random-Edge [Matoušek-Szabó (2006)] Lower bounds do not correspondto actual linear programs Can geometry help?
Random-Edge , Random-Facetare not polynomial for LPs Consider LPs that correspond toMarkov Decision Processes (MDPs) Simplex Policy iteration Obtain sub-exponential lower bounds for theRandom-Edge and Random-Facet variantsof the Policy Iteration algorithm for MDPs
Randomized Pivoting Rules Lower bounds obtained for LPs whose diameter is n [Kalai’92][Matousek-Sharir-Welzl’92] [Friedmann-Hansen-Z ’11]
Turn-based 2-PlayerStochastic Games[Shapley ’53] [Gillette ’57] … [Condon ’92] Total reward version Discounted version Limiting average version Both players have optimal positional strategies Can optimal strategies be found in polynomial time?
Stopping condition For the total reward version assume: No matter what the players do, the game stops with probability 1. Exercise: Show that discounted games correspond directly to stopping total reward games
Strategies / Policies A deterministicstrategy specifies which actionto take given every possible history A mixedstrategy is a probability distributionover deterministic strategies A memorylessstrategy is a strategy that depends only on the current state A positionalstrategy is a deterministicmemoryless strategy
Values general positional general positional Both players have positionaloptimal strategies There are positional strategies that are optimal for every starting position
Markov Decision Processes [Shapley ’53] [Bellman ’57] [Howard ’60] … Total reward version Discounted version Limiting average version Optimal positionalpoliciescan be found using LP Is there a strongly polynomialtime algorithm?
Stochastic shortest paths (SSPs) Minimize the expected costof getting to the target
Turn-based non-Stochastic Games[Ehrenfeucht-Mycielski(1979)] Total reward version Easy Limiting average version Discounted version Both players have optimal positional strategies Still no polynomialtime algorithms known!
Turn-basedStochastic Games (SGs)long-term planning in a stochasticandadversarial environment 2½-players Non-StochasticGames (MPGs)adversarialnon-stochastic Markov Decision Processes (MDPs)non-adversarialstochastic 2-players 1½-players Deterministic MDPs (DMDPs) non-stochastic,non-adversarial 1-player
Parity Games (PGs) A simple example Priorities 2 3 2 1 4 1 EVEN wins if largest priorityseen infinitely often is even
8 3 ODD EVEN Parity Games (PGs) EVEN wins if largest priorityseen infinitely often is even Equivalent to many interesting problemsin automata and verification: Non-emptyness of -tree automata modal -calculus model checking
8 3 ODD EVEN Parity Games (PGs) Mean Payoff Games (MPGs) [Stirling (1993)] [Puri (1995)] Replace priority k by payoff (n)k Move payoffs to outgoing edges
Evaluating a policy MDP + policy Markov Chain Values of a fixed policy can be found by solving a system of linear equations
Dual LP formulation for MDPs a is not an improving switch Basic solution (positional) Policy
Primal LP formulation for MDPs Vertex Complement of a Policy
TB2SG NP co-NP TB2SG P ???
Random-Facet for MDPs • Choose a random action not in the current policy and ignore it. • Solve recursively without this action. • If the ignored action is not an improving switch with respect to the returned policy,we are done. • Otherwise, switch to the ignored action and solve recursively.
Policy iteration for 2-player games • Keep a strategy of player 1 and an optimal counter-strategy of player 2. • Perform improving switches for player 1 and recompute an optimal counter-strategy for player 2. Exercise: Does it really work? Random-Facet yields a sub-exponential algorithmfor turn-based 2-player stochastic games!
Lower bounds for Policy Iteration Switch-All for Parity Games is exponential [Friedmann ’09] Switch-All for MDPs is exponential [Fearnley ’10] Random-Facet for Parity Games is sub-exponential [Friedmann-Hansen-Z ’11] Random-Facet and Random-Edge for MDPs and hence for LPs are sub-exponential [FHZ’11]
Lower bound for Random-Facet Implement a randomized counter
Lower bound for Random-Facet Implement a randomized counter • Lower bound for Random-Edge Implement a standard counter
3-bit counter (−N)15
3-bit counter 0 1 0
3-bit counter – Improving switches Random-Edge can choose eitherone of these improving switches… 0 1 0
Cycle gadgets Cycles close one edge at a time Shorter cycles close faster
Cycle gadgets Cycles open “simultaneously”
3-bit counter 23 1 0 1 0
From b to b+1 in seven phases Bk-cycle closes Ck-cycle closes U-lane realigns Ai-cycles and Bi-cycles for i<k open Ak-cycle closes W-lane realigns Ci-cycles of 0-bits open
3-bit counter 34 1 0 1
Size of cycles Various cycles and lanes compete with each other Some are trying to open while some are trying to close We need to make sure that our candidates win! Length of all A-cycles = 8n Length of all C-cycles = 22n Length of Bi-cycles = 25i2n O(n4)vertices for an n-bit counter Can be improved using a more complicated construction and an improved analysis (work in progress)
Concluding remarks and open problems “Game-theoretic” perspective help understandthe behavior of randomized pivoting rules Polynomial pivoting rule? Polynomialbound on diameter? Strongly polynomial algorithms for MDPs? Polynomialalgorithms 2-player games?