Games, Times, and Probabilities: Value Iteration in Verification and Control

Games, Times, and Probabilities:Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Graph Models of Systems vertices = states edges = transitions paths = behaviors

Extended Graph Models OBJECTIVE: -automaton -regular game CONTROL: game graph stochastic game graph PROBABILITIES: Markov decision process stochastic hybrid system CLOCKS: timed automaton

Graphs vs. Games a a a b a b

Games model Open Systems Two players: environment / controller / input vs. system / plant / output Multiple players: processes / components / agents Stochastic players: nature / randomized algorithms

Example P1: init x := 0 loop choice | x := x+1 mod 2 | x := 0 end choice end loop 1: (x = y ) P2: init y := 0 loop choice | y := x | y := x+1 mod 2 end choice end loop 2:( y = 0 )

Graph Questions 8 ( x = y ) 9 ( x = y ) CTL

Graph Questions 8 ( x = y ) 9 ( x = y ) X 00 01  10 11 CTL

Zero-Sum Game Questions hhP1ii ( x = y ) hhP2ii ( y = 0 ) ATL [Alur/H/Kupferman]

Zero-Sum Game Questions 00 00 00 hhP1ii ( x = y ) hhP2ii ( y = 0 ) 10 01 10 01 10 01 11 ATL [Alur/H/Kupferman] 11 11

Zero-Sum Game Questions 00 00 00 hhP1ii ( x = y ) hhP2ii ( y = 0 ) X 10 01 10 01 10 01 11 ATL [Alur/H/Kupferman] 11 11

Zero-Sum Game Questions 00 00 00 hhP1ii ( x = y ) hhP2ii ( y = 0 ) X  10 01 10 01 10 01 11 ATL [Alur/H/Kupferman] 11 11

Nonzero-Sum Game Questions 00 hhP1ii ( x = y )  hhP2ii ( y = 0 ) 00 00 10 01 10 01 10 01 11 Secure equilibra [Chatterjee/H/Jurdzinski] 11 11

Nonzero-Sum Game Questions 00 hhP1ii ( x = y )  hhP2ii ( y = 0 ) 00 00  10 01 10 01 10 01 11 Secure equilibra [Chatterjee/H/Jurdzinski] 11 11

Strategies Strategies x,y: Q*! Q From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q.

Strategies Strategies x,y: Q*! Q From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q. hhP1ii1 = (9x21) (8 y22)1(x,y) Short for: q ²hhP1ii1 iff (9 x21) (8 y22) ( Outcomex,y(q) ²1 )

Strategies Strategies x,y: Q*! Q From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q. hhP1ii1 = (9 x21) (8 y22)1(x,y) hhP1ii1  hhP2ii2 = (9 x21)(9 y22) [ (1 Æ 2)(x,y) Æ(8 y’22) (2 ! 1)(x,y’) Æ(8 x’21) (2 ! 1)(x,y) ]

Objectives 1and 2 Qualitative:reachability; Buechi; parity (-regular) Quantitative: max; lim sup; lim avg

Normal Forms of -Regular Sets Reachability } a Safety  a = :}: a Borel-1

Normal Forms of -Regular Sets Reachability } a Safety  a = :}: a Buechi } a coBuechi } a = :}: a Borel-1 Borel-2

Normal Forms of -Regular Sets Reachability } a Safety  a = :}: a Buechi } a coBuechi } a = :}: a Streett Æ ( } a !} b ) = Æ ( }: a Ç} b ) Rabin Ç ( } a Æ} b ) Parity: complement-closed subset of Streett/Rabin Borel-1 Borel-2 Borel-2.5

Buechi Game G q1 q3 q2 q0 B q4

Buechi Game G q1 q3 q2 q0 B q4 • Secure equilibrium (x,y) at q0: • x: if q1! q0, then q2 else q4. y: if q3! q1, then q0 else q4. • Strategies require memory.

Zero-Sum Games: Determinacy 1 = :2 W1 hhP1ii1 W2 hhP2ii2

Nonzero-sum Games W00 W01 hhP2ii (2Æ:1) W11 W10 hhP1ii (1Æ:2 ) hhP1ii1 hhP2ii2

Objectives Qualitative:reachability; Buchi; parity (-regular) Quantitative: max; lim sup; lim avg

Objectives Qualitative:reachability; Buchi; parity (-regular) Quantitative: max; lim sup; lim avg Borel-3 Borel-1 Borel-2

Quantitative Games hhP1ii lim sup hhP1ii lim avg 2 4 0 0 2 3 2 0 4

Quantitative Games hhP1ii lim sup = 3 hhP1ii lim avg 2 4 0 0 2 3 2 0 4

Quantitative Games hhP1ii lim sup = 3 hhP1ii lim avg = 1 2 4 0 0 2 3 2 0 4

Solving Games by Value Iteration Generalization of the -calculus: computing fixpoints of transfer functions (pre; post). Generalization of dynamic programming: iterative optimization. R(q’) Region R: Q ! V q q’

Solving Games by Value Iteration Generalization of the -calculus: computing fixpoints of transfer functions (pre; post). Generalization of dynamic programming: iterative optimization. R(q) := pre(R(q’)) R(q’) Region R: Q ! V q q’

Graph Q states  transition labels d: Q    Q transition function

Graph Q states  transition labels d: Q    Q transition function  = [ Q ! {0,1} ] regions with V = B 9pre:   q  9pre(R) iff () d(q,)  R 8pre:   q  8pre(R) iff () d(q,)  R

Graph a b c 9 c = ( X) ( c Ç9pre(X) )

Graph a b c 9 c = ( X) ( c Ç9pre(X) ) 8 c= ( X) ( c Ç8pre(X) )

Graph Reachability  R Given RµQ, find the states from which some path leads to R. R

Graph Reachability  R = (m X) (R Ç9pre(X)) Given RµQ, find the states from which some path leads to R. R R [ pre(R)

Graph Reachability  R = (m X) (R Ç9pre(X)) Given RµQ, find the states from which some path leads to R. R R [ pre(R) R [ pre(R) [ pre2(R)

Graph Reachability  R = (m X) (R Ç9pre(X)) Given RµQ, find the states from which some path leads to R. R R R [ pre(R) . . . R [ pre(R) [ pre2(R)

Graph Reachability  R = (m X) (R Ç8pre(X)) Given RµQ, find the states from which all paths lead to R. R R R [ pre(R) . . . R [ pre(R) [ pre2(R)

Value Iteration Algorithms • consist of • LOCAL PART: 9pre and 8pre computation • GLOBAL PART: evaluation of a fixpoint expression • We need to generalize both parts to solve games.

Turn-based Game Q1, Q2 states ( Q = Q1[ Q2 )  transition labels d: Q    Q transition function

Turn-based Game Q1, Q2 states ( Q = Q1[ Q2 )  transition labels d: Q    Q transition function  = [ Q ! {0,1} ] regions with V = B 1pre:   q  1pre(R) iff q 2 Q1Æ (  ) d(q,)  R or q 2 Q2Æ (8 2) (q,) 2 R

Turn-based Game Q1, Q2 states ( Q = Q1[ Q2 )  transition labels d: Q    Q transition function  = [ Q ! {0,1} ] regions with V = B 1pre:   q  1pre(R) iff q 2 Q1Æ (  ) d(q,)  R or q 2 Q2Æ (8 2) (q,) 2 R 2pre:   q  2pre(R) iff q 2 Q1Æ (8  ) d(q,)  R or q 2 Q2Æ (9 2 ) (q,) 2 R

Turn-based Game c a b

Turn-based Game c a b hhP1iic = ( X) ( c Ç1pre(X) )

Games, Times, and Probabilities: Value Iteration in Verification and Control