Markov Decision Processes: Approximate Equivalence

Markov Decision Processes: Approximate Equivalence Michel de Rougemont Université Paris II & LRI http://www.lri.fr/~mdr/

The world of MDPs • Follow-up of: On the complexity of partially observed markov decision processes, 1996, D. Burago, Anatol, Mdr • What is robustness? Deviation model in the 1990s. • Distance on runs in the 2000s • Efficient Distance of a run to an MDP • Approximate Comparison of MDPs Statistics Analysis of Probabilistic Processes, (LICS 2009 with Mathieu Tracol)

M.D.P S : States :s,t,u,v Σ: actions : a,b,c P(u |t,b)=0. Policy σ resolves the non-determinism. Example: σ(t)=b, σ(v)=c Run: s,t,a,u,a,v Trace: aba

This talk • Approximation of Decision problems: Property Testing • Non deterministic Automata: Tester for membership and equivalence. • Markov Decision Processes: Tester for the Existence of Strategies, and Equivalence

1. Testers on a class K Let F be a property on a class K of structures U: An ε -tester for F is a probabilistic algorithm A such that: • If U |= F, A accepts • If U is ε far from F, A rejects with high probability F is testable if there is a probabilistic algorithm A such that • A is an ε -tester for all ε • Time(A) is independent of n=size(U). Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 Property Testing and its connection to Learning and Approximation. O. Goldreich, S. Goldwasser, D. Ron, 1996. Tester usually implies a linear time corrector. (ε1, ε2)-Tolerant Tester

Edit Distances with Moves on Strings • Classical Edit Distance:Insertions, Deletions, Modifications • Edit Distance with moves : dist(w,w’) 0111000011110011001 0111011110000011001 3. Edit Distance with Moves generalizes to Ordered Trees

Uniform statistics: k-gram W=001010101110 length n, u.stat: any subwords of length k, n-k+1 blocks, shingles

Tester for equality Edit distance with moves. NP-complete problem, but approximable in constant time with additive error. Uniform statistics ( ): W=001010101110 Theorem 1. |u.stat(w)-u.stat(w’)| approximates dist(w,w’)/n. Sample N subwords of length k, compute Y(w) and Y(w’): Lemma (Chernoff).Y(w) approximates u.stat(w). Corollary. |Y(w)-Y(w’)| approximates dist(w,w’)/n. Tester 1: If |Y(w)-Y(w’)| <ε. accept, else reject.

Tester for W є r (regular language) H={u.stat(W) : W in r } is a union of polytopes. 2 Polytopes for r. Y(w) Membership Tester:

2. Equivalence Tester for regular properties Time polynomial in m=Max(|A |, |B |): The exact equivalence is PSPACE complete

3. Markov Decision Processes SD: σ(t)=b, σ(v)=c Trace 1: abac ab abac ab ……. Trace 2: ab abac ab abac……. Policies σ : HR: History dependent and Randomized, MR(k): Memory k, Randomized SD: Stationary Deterministic Communicating MDP

Classical results: k=1 State-action frequencies: For a class K of strategies: Theorem (Puterman, Derman, Tsitsiklis) For a communicating MDP,

Generalization Theorem: For a communicating MDP H x

Existence of a strategy Input: MDP, wn ,δ, λ Theorem: Existence of a strategy is PSPACE hard but testable. Tester: Sample wn : Estimate the dist to H (linear program) H x

General MDPs Union of polytopes: each H can be computed by a linear program. Threshold value for each component. H2: .6 H1: .4

Equivalence of MDPs Decide if the Polytopes are identical with identical threshold values. Equivalence Tester: discretize the polytopes with an ε grid. Check mutual inclusion.

Conclusion • Testers for MDPs. Verify property such as: « Almost surely there are less than 10% a » « After an a, there is a b » …… 2. Testers for probabilistic systems • Approximate Probabilistic Membership • Approximate Equivalence 3. VERAP: http://www.lri.fr/~mdr/verap/

Markov Decision Processes: Approximate Equivalence

Markov Decision Processes: Approximate Equivalence

Presentation Transcript

decision making (modern) powerpoint presentation content: 16

Markov Random Fields

Genomic equivalence

Equivalence Relations. Partial Ordering Relations

Introduction to Medical Decision Making and Decision Analysis

Business Driven Information Systems 2e

Markov Decision Processes: A Survey

Stock Returns Predictability using Markov Regime Switching Model

Markov Logic

Sequential Equivalence Checking : Need and Challenges

PRODUCTION PROCESSES AND EQUIPMENT

Planning under Uncertainty with Markov Decision Processes: Lecture II

Partially Observable Markov Decision Processes

OOS, OOE, OOT, OOL and correct decision making

DECISION MAKING

Simulation Algorithms for Lattice QCD

A Contribution to Reinforcement Learning; Application to Computer Go

Programming with Decision Procedures

LECTURE 2

Chapter 5

Learning Optimal Strategies for Spoken Dialogue Systems