1 / 35

Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks. Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe. Motivation. Real-world Applications of Mobile Sensor Networks Robots in an urban setting Autonomous Under-water vehicles.

mignon
Télécharger la présentation

Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DCOPs Meet the Real World:Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe Manish Jain

  2. Motivation • Real-world Applications of Mobile Sensor Networks • Robots in an urban setting • Autonomous Under-water vehicles Manish Jain

  3. Challenges • Rewards are unknown • Limited time-horizon • Anytime performance is important Manish Jain

  4. Existing Models • Distributed Constraint Optimization for sensor networks • [Lesser03, Zhang03, …] • Mobile Sensor Nets for Communication • [Cheng2005, Marden07, …] • Factor Graphs • [Farinelli08, …] • Swarm Intelligence, Potential Games • Other Robotic Approaches … Manish Jain

  5. Contributions • Propose new algorithms for DCOPs • Seamlessly interleave Distributed Exploration and Distributed Exploitation • Tests on physical hardware Manish Jain

  6. Outline • Background on DCOPs • Solution Techniques • Experimental Results • Conclusions and Future Work Manish Jain

  7. DCOP Framework a1 a2 a3 Manish Jain

  8. Applying DCOP Manish Jain

  9. k-Optimality [Pearce07] a1 a2 a3 1-optimal solutions: all or all R< > = 12 R< > = 6 Manish Jain

  10. MGM-Omniscient a1 a2 a3 Manish Jain

  11. MGM-Omniscient 10 a1 a2 a3 Manish Jain

  12. MGM-Omniscient 10 10 12 a1 a2 a3 Manish Jain

  13. MGM-Omniscient • Only one agent per neighborhood allowed to change • Monotonic Algorithm 10 10 12 a1 a2 a3 0 0 0 a1 a2 a3 Manish Jain

  14. Solution Techniques • Static Estimation • SE-Optimistic • SE-Realistic • Balanced Exploration using Decision Theory • BE-Backtrack • BE-Rebid • BE-Stay Manish Jain

  15. Static Estimation Techniques • SE-Optimistic • Always assume that exploration is better • Greedy Approach Manish Jain

  16. Static Estimation Techniques • SE-Optimistic • Always assume that exploration is better • Greedy Approach • SE-Realistic • More conservative – assume exploration gives mean reward • Faster convergence Manish Jain

  17. Balanced Exploration Techniques Manish Jain

  18. Balanced Exploration Techniques • BE-Backtrack • Decision Theoretic Limit on exploration • Track previous best location Rb • State of the agent: (Rb,T) Manish Jain

  19. Balanced Exploration Techniques Manish Jain

  20. Balanced Exploration Techniques Utility of Exploration Manish Jain

  21. Balanced Exploration Techniques Utility of Backtrack after Successful Exploration Manish Jain

  22. Balanced Exploration Techniques Utility of Backtrack after Unsuccessful Exploration Manish Jain

  23. Balanced Exploration Techniques • BE-Rebid • Allows agents to backtrack • Re-evaluate every time-step • Allows for on-the-flyreasoning • Same equations as BE-Backtrack Manish Jain

  24. Balanced Exploration Techniques • BE-Stay • Agents unable to backtrack • Dynamic Programming Approach Manish Jain

  25. Results Manish Jain

  26. Results Learning Curve (20 agents, chain, 100 rounds) Manish Jain

  27. Results (simulation) (chain topology, 100 rounds) Manish Jain

  28. Results (simulation) (10 agents, random graphs with 15-20 links) Manish Jain

  29. Results (simulation) (20 agents, 100 rounds) Manish Jain

  30. Results (physical robots) Manish Jain

  31. Results (physical robots) (4 robots, 20 rounds) Manish Jain

  32. Conclusions • Provide algorithms for DCOPs addressing real-world challenges • Demonstrated improvement with physical hardware Manish Jain

  33. Future Work • Scaling up the evaluation • different approaches • different parameter settings • Examine alternate metrics • battery drain • throughput • cost to movement • Verify algorithms in other domains Manish Jain

  34. Thank You manish.jain@usc.edu http://teamcore.usc.edu/manish Manish Jain

  35. Conclusions • Provide algorithms for DCOPs addressing real-world challenges • Demonstrated improvement with physical hardware manish.jain@usc.edu http://teamcore.usc.edu/manish Manish Jain

More Related