1 / 14

Markov Decision Processes

Math 419/519 Prof. Andrew Ross. Markov Decision Processes. Highway Pavement Maintenance. Thanks to Pablo Durango-Cohen for this example. Though I have made up the numbers. Classify highway pavement condition as: Good Fair Poor Can do 4 kinds of repairs: Expensive Moderate Cheap

maia-watts
Télécharger la présentation

Markov Decision Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Math 419/519 Prof. Andrew Ross Markov Decision Processes

  2. Highway Pavement Maintenance • Thanks to Pablo Durango-Cohen for this example. • Though I have made up the numbers. • Classify highway pavement condition as: • Good • Fair • Poor • Can do 4 kinds of repairs: • Expensive • Moderate • Cheap • Nothing

  3. Timeline • April: check condition of road, decide on action. • Summer: repair road as decided • Fall/Winter: road might deteriorate • April: check condition, etc.

  4. Markov Assumptions • How we got to current condition does not matter. • Future deterioration depends only on the present condition and action. • When choosing an action, we will only look at the present condition, not the past • This is a policy decision, not a statement about road physics. We could change this policy, but it would make the problem bigger.

  5. If we do Nothing • Road deteriorates according to this transition matrix: • Do the zeros make sense? • Does it make sense that the probabilities decrease from right to left?

  6. If we do Cheap repairs • Road improves/deteriorates according to this transition matrix:

  7. If we do Moderate repairs • Road improves/deteriorates according to this transition matrix:

  8. If we do Expensive repairs • Road improves/deteriorates according to this transition matrix:

  9. Repair Policy • Natural to say: “If it's in Good condition, do Nothing. If it's in Fair condition, do ___. If it's in Poor condition, do ___.” • Rather than if/then, let's make a Policy Matrix:

  10. Mixed Policies? • Maybe we can't afford to do Expensive repairs each time the road becomes Poor—only 30% of the time? Etc.

  11. “The” transition matrix? • Changes when you change your policy matrix. • Pr(Good next | Fair now) = Pr(Good next | Fair now, do Nothing)*Pr(Nothing|Fair) + Pr(Good next | Fair now, do Cheap)*Pr(Cheap|Fair) + Pr(Good next | Fair now, do Moderate)*Pr(Moderate|Fair) + Pr(Good next | Fair now, do Expensive)*Pr(Expensive|Fair) • And that's just one of 9 entries in the 3x3 matrix!

  12. Overall Cost • Given a policy matrix, find the transition matrix • Then find the steady-state distribution • Then find how often we do each action • Then account for the cost of each action • Then change the policy matrix a little, try to find a cheaper overall cost. • See the book for the math notation.

  13. Other Thoughts • Can find optimal policy through: • “Policy Iteration” • “Value Iteration” • Related to Dynamic Programming

  14. References • Wayne Winston: “Introduction to Operations Research” book • Ronald A. Howard: “Comments on the Origin and Application of Markov Decision Processes” article in journal “Operations Research”, Vol 50 issue 1.

More Related