Exploring Semi-Markov Decision Processes: A Comprehensive Overview from MDPs to SMDPs
This summary delves into the evolution from Markov Decision Processes (MDPs) to Semi-Markov Decision Processes (SMDPs), highlighting the flexibility of SMDPs in incorporating random transition times. Key insights include the adaptation of Bellman equations for SMDPs, approaches for maximizing average and discounted rewards, and the extension of Q-Learning. Additionally, policy iteration is discussed as an efficient alternative to standard methods. Applications span across various fields, including supply chain management and disaster response, enriching your understanding of advanced decision-making frameworks.
Exploring Semi-Markov Decision Processes: A Comprehensive Overview from MDPs to SMDPs
E N D
Presentation Transcript
Some Final Thoughts Abhijit Gosavi
From MDPs to SMDPs • The Semi-MDP is a more general model in which the time for transition is also a random variable. • The MDP Bellman equations can be extended to SMDPs to accommodate time.
SMDPs (contd.) • In the average reward case, we would be interested in maximizing the average reward per unit time. • For the discounted reward case, we will need to discount proportionate to the time spent in each transition. • The Q-Learning algorithm for discounted reward has a direct extension. • For average reward, we have a family of algorithms called R-SMART (see book for references).
Policy Iteration • Another method to solve the MDP: an alternative to SMDPs • Slightly more involved mathematically • Sometimes more efficient than value iteration • Its Reinforcement Learning counterpart is called Approximate Policy Iteration
Other Applications • Supply Chain Problems • Disaster Response Management • Production Planning in Remanufacturing Systems • Continuous event systems (LQG control)
What you’ve learned (hopefully ) • Markov chains and how they can be employed to model systems • Markov decision processes: the idea of optimizing systems (controls) driven by Markov chains • Some concepts from Artificial Intelligence • Some (hopefully) cool applications of Reinforcement Learning • Some coding (for those who were not averse to doing it) • Systems thinking • Coding iterative algorithms • Some discrete-event simulation • HOPE YOU’VE ENJOYED THE CLASS!