Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu, S. Young Cambridge University Engineering Department | {mg436, fj228, sk561, farm2, brmt2, ky219, sjy}@eng.cam.ac.uk POMDP-based Dialogue Management Kernel Function Real-world Problem: CamInfo Dialogue • Models the uncertainty about the dialogue state by maintaining a distribution over all possible dialogue states in every turn – belief state • Maximises the overall dialogue success by optimisingQ-function Q(a,b(s))– the highest expected long-term reward for action ataken in belief state b(s) • Problem: To ensure tractability of optimisation, standard methods discretise the belief space into a small number of points, causing the loss of information. • Solution: Model the continuous nature of the belief state and include the prior knowledge about the domain to speed up the learning process. • If the Q-function value was known in one belief state-action pair what is the Q-function value in another belief state for the same action? • Prior knowledge about Q-function correlations is incorporated in the kernel function. • Kernel hyper-parameters can be learned from the data labelled with belief states, actions and rewards. In that way the kernel captures correlations found in the data. • Hidden Information State Dialogue Manager • POMDP-based Dialogue Manager that can tractably maintain belief state for real-world problems • Optimises policy in a reduced summary space • CamInfo Domain • Tourist Information domain for Cambridge, UK Belief state Action Q-function value Would you like to save or delete the message? Results on CamInfo • Comparison: • GP-Sarsa with polynomial kernel • Active learning GP-Sarsa – during exploration selects the action that has the highest uncertainty estimated by the Gaussian Process • Grid-based Monte Carlo Control Algorithm Would you like to save or delete the message? Gaussian Processes in Reinforcement Learning • Gaussian Process (GP) – non-parametric Bayesian model for function approximation • For given prior function correlations and some noisy function observations, it estimates the posterior of any function value • GP-Sarsa is an online Reinforcement learning algorithm that models Q-function as a Gaussian Process Results on VoiceMail Toy problem: VoiceMail Dialogue • Comparison: • GP-Sarsawith various kernel functions • Grid-based Monte Carlo Control algorithm • Exact POMDP solution • In order to estimate the speed of convergence, the policy was evaluated after every training batch: The user asks the system to save or delete the message. The user input is corrupted with noise, so the true dialogue state is unknown. Which action leads to success? Standard approach: Q(a,b(s)) value GP-Sarsa: Distribution of Q(a,b(s)) value Action a Would you like to save or delete the message? a Conclusion action b(s) b’(s) Your message is deleted. • GP-Sarsa can obtain the optimal policy faster than standard methods provided an adequate choice of the kernel function • The measure of uncertainty that GP-Sarsa is estimating can be utilised in an Active Learning framework to additionally speed up the learning process r (immediate) reward Acknowledgements belief state This research was partly funded by the UK EPSRC under grant agreement EP/F013930/1 and by the EU FP7 Programme under grant agreement 216594 (CLASSiC project: www.classic-project.org). A Gaussian Process models every Q-function value Q(a,b(s)) as a Gaussian distributed random variable. The variance of the Gaussian distribution provides a measure of uncertainty about the approximation.

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers

Presentation Transcript

Gaussian Processes in Machine Learning

Gaussian KD-Tree for Fast High-Dimensional Filtering

A PTAS for Computing the Supremum of Gaussian Processes

Sparse Approximations to Bayesian Gaussian Processes

Nonmyopic Active Learning of Gaussian Processes

NATIONAL POLICY DIALOGUE

Gaussian Processes for Statistical Soil Modeling of the Tropics

Gaussian Processes for Active Sensor Management

Applying Sequential, Sparse Gaussian Processes – an Illustration Based on SIC2004

Relational Learning with Gaussian Processes

Fast approximate POMDP planning: Overcoming the curse of history!

Advances in Point-Based POMDP Solvers

Sparse Approximations to Bayesian Gaussian Processes

Gaussian Processes for Transcription Factor Protein Inference

Bayesian methods, priors and Gaussian processes

GraphCut -based Optimisation for Computer Vision

Gaussian Processes in Machine Learning

NATIONAL POLICY DIALOGUE

Nonmyopic Active Learning of Gaussian Processes

POMDP

Propagating Uncertainty In POMDP Value Iteration with Gaussian Process