Learning Opponent-type Probabilities for PrOM search

Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht 6th Computer Olympiad

Contents • OM search and PrOM search • Learning for PrOM search • Off-line Learning • On-line Learning • Conclusions & Future research 6th Computer Olympiad

OM search • MAX player uses evaluation function V0 • Opponent uses different evaluation function (Vop) • At MIN nodes: predict which move the opponent will select (using standard search and Vop) • At MAX nodes, pick the move that maximizes the search value (based on V0) • At leaf nodes: use V0 6th Computer Olympiad

PrOM search • Extended Opponent Model: • a set of opponent types (e.g. evaluation functions) • a probability distribution over this set • Interpretation: At every move, the opponent uses a random device to pick one of the opponent types, and plays using the selected type. 6th Computer Olympiad

PrOM search algorithm • At MIN nodes: determine for every opponent type which move would be selected. • Compute the MAX player’s value for these moves • Use opponent-type probabilities to compute the expected value of the MIN node • at MAX nodes: select maximum child 6th Computer Olympiad

Learning in PrOM search • How do we assess the probabilities on the opponent types? • Off line: use games previously played by the opponent, to estimate the probabilities. (lot of time and - possibly - data available) • On line: use the observed moves during a game to adjust the probabilities.(only little time and few observations)prior probabilities are needed. 6th Computer Olympiad

Off-Line Learning • Ultimate Learning Goal: find P**(opp) for a given opponent and given opponent types such that PrOM search plays the best against that opponent. • Assumption: PrOM search plays the best if P** = P*, where P*(opp) is the mixed strategy that predicts the moves of the opponent the best. 6th Computer Olympiad

Off-Line Learning • How to obtain P*(opp)? • Input: a set of positions and the moves that the given opponent and all the given opponent types would select • “Algorithm”: P*(oppi) = Ni / N • But: leave out all ambiguous positions! (e.g. when more than one opponent type agree with the opponent) 6th Computer Olympiad

Off-Line Learning • Case I: The opponent is using a mixed strategy P#(opp) of the given opponent types • Effective learning is possible (P*(opp)  P# (opp)) • More difficult if the opponent types are not independent 6th Computer Olympiad

Not leaving out ambiguous events 5 opponent types P = (a,b,b,b,b) 20 moves 100 - 100,000 runs 100 samples 6th Computer Olympiad

Leaving out ambiguous events 5 opponent types P = (a,b,b,b,b) 20 moves 10 - 100,000 runs 100 samples 6th Computer Olympiad

Varying number of opponent types 2-20 opponent types P = (a,b,b,b,b) 20 moves 100,000 runs 100 samples 6th Computer Olympiad

Off-Line Learning • Case 2: The opponent is using a different strategy. • Opponent types behave random but dependent(distribution of type i depends on type i-1) • Real opponent selects a fixed move 6th Computer Olympiad

Learning error Learned probabilities 6th Computer Olympiad

Fast On-Line Learning • At the principal MIN node, only the best moves for every opponent type are needed • Increase the probability of an opponent type slightly if the observed move is the same as the selected move of this opponent type only. Normalize all probabilities. • Drift to one opponent type is possible. 6th Computer Olympiad

Slower On-Line LearningNaive Bayesian (Duda & Hart’73) • Compute the value of every move at the principal MIN node for every opponent type • Transform these values into conditional probabilities P(move | opp). • Compute P(opp | moveobs) using P*(opp) (Bayes rule) • take P*(opp) = a.P*(opp) + (1- a) P(opp | moveobs) 6th Computer Olympiad

Naïve Bayesian Learning • In the end, drifting to 1-0 probabilities will occur almost always • Parameter a is very important for the actual performance: • amount of change in the probabilities • convergence • drifting speed • It should be tuned in a real setting 6th Computer Olympiad

Conclusions • Effective off-line learning of probabilities is possible, when ambiguous events are disregarded. • Off-line learning also works if the opponent does not use a mixed strategy of known opponent types. • On-line learning must be tuned precisely to a given situation 6th Computer Olympiad

Future Research • PrOM search and learning in real game playing • Zanzibar Bao (8x4 mancala) • LOA (some experiment with OM-search done) • Chess endgames 6th Computer Olympiad

Learning Opponent-type Probabilities for PrOM search

Learning Opponent-type Probabilities for PrOM search

Presentation Transcript

Probabilities

Prom Prom Tickets Prom Decorating

Learning Influence Probabilities in Social Networks

Learning Influence Probabilities in Social Networks

Viral Marketing – Learning Influence Probabilities

Opponent paper

Probabilities

Opponent paper

Learning Motion Prediction Models for Opponent Interception

Predicting Good Probabilities With Supervised Learning

Required Sample Size, Type II Error Probabilities

Poker: Opponent Modelling

Probabilities

Budgeting for prom

Type in Job Search

Probabilities

antagonist, opponent, rival

Type II Error Probabilities, Required Sample Size for Specified Power

Learning Influence Probabilities in Social Networks

Opponent Processes

In economics, probabilities usually unknown; inflation next year, strategy opponent, …

Prom Dress Ideas For Every Body Type