350 likes | 434 Vues
DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks. Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe. Motivation. Real-world Applications of Mobile Sensor Networks Robots in an urban setting Autonomous Under-water vehicles.
E N D
DCOPs Meet the Real World:Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe Manish Jain
Motivation • Real-world Applications of Mobile Sensor Networks • Robots in an urban setting • Autonomous Under-water vehicles Manish Jain
Challenges • Rewards are unknown • Limited time-horizon • Anytime performance is important Manish Jain
Existing Models • Distributed Constraint Optimization for sensor networks • [Lesser03, Zhang03, …] • Mobile Sensor Nets for Communication • [Cheng2005, Marden07, …] • Factor Graphs • [Farinelli08, …] • Swarm Intelligence, Potential Games • Other Robotic Approaches … Manish Jain
Contributions • Propose new algorithms for DCOPs • Seamlessly interleave Distributed Exploration and Distributed Exploitation • Tests on physical hardware Manish Jain
Outline • Background on DCOPs • Solution Techniques • Experimental Results • Conclusions and Future Work Manish Jain
DCOP Framework a1 a2 a3 Manish Jain
Applying DCOP Manish Jain
k-Optimality [Pearce07] a1 a2 a3 1-optimal solutions: all or all R< > = 12 R< > = 6 Manish Jain
MGM-Omniscient a1 a2 a3 Manish Jain
MGM-Omniscient 10 a1 a2 a3 Manish Jain
MGM-Omniscient 10 10 12 a1 a2 a3 Manish Jain
MGM-Omniscient • Only one agent per neighborhood allowed to change • Monotonic Algorithm 10 10 12 a1 a2 a3 0 0 0 a1 a2 a3 Manish Jain
Solution Techniques • Static Estimation • SE-Optimistic • SE-Realistic • Balanced Exploration using Decision Theory • BE-Backtrack • BE-Rebid • BE-Stay Manish Jain
Static Estimation Techniques • SE-Optimistic • Always assume that exploration is better • Greedy Approach Manish Jain
Static Estimation Techniques • SE-Optimistic • Always assume that exploration is better • Greedy Approach • SE-Realistic • More conservative – assume exploration gives mean reward • Faster convergence Manish Jain
Balanced Exploration Techniques Manish Jain
Balanced Exploration Techniques • BE-Backtrack • Decision Theoretic Limit on exploration • Track previous best location Rb • State of the agent: (Rb,T) Manish Jain
Balanced Exploration Techniques Manish Jain
Balanced Exploration Techniques Utility of Exploration Manish Jain
Balanced Exploration Techniques Utility of Backtrack after Successful Exploration Manish Jain
Balanced Exploration Techniques Utility of Backtrack after Unsuccessful Exploration Manish Jain
Balanced Exploration Techniques • BE-Rebid • Allows agents to backtrack • Re-evaluate every time-step • Allows for on-the-flyreasoning • Same equations as BE-Backtrack Manish Jain
Balanced Exploration Techniques • BE-Stay • Agents unable to backtrack • Dynamic Programming Approach Manish Jain
Results Manish Jain
Results Learning Curve (20 agents, chain, 100 rounds) Manish Jain
Results (simulation) (chain topology, 100 rounds) Manish Jain
Results (simulation) (10 agents, random graphs with 15-20 links) Manish Jain
Results (simulation) (20 agents, 100 rounds) Manish Jain
Results (physical robots) Manish Jain
Results (physical robots) (4 robots, 20 rounds) Manish Jain
Conclusions • Provide algorithms for DCOPs addressing real-world challenges • Demonstrated improvement with physical hardware Manish Jain
Future Work • Scaling up the evaluation • different approaches • different parameter settings • Examine alternate metrics • battery drain • throughput • cost to movement • Verify algorithms in other domains Manish Jain
Thank You manish.jain@usc.edu http://teamcore.usc.edu/manish Manish Jain
Conclusions • Provide algorithms for DCOPs addressing real-world challenges • Demonstrated improvement with physical hardware manish.jain@usc.edu http://teamcore.usc.edu/manish Manish Jain