Load Balancing and Load Reduction for Multiple Resource-Bounded Real-Time Agents

Load Balancing and Load Reduction for Multiple Resource-Bounded Real-Time Agents Edmund H. Durfee University of Michigan FALSE2002 Workshop Vanderbilt University November 15, 2002

Talk Outline • Agents • Real-Time Agents • Resource-Bounded Real-Time Agents • Multiple Resource-Bounded Real-Time Agents • Load Balancing/Reduction for Multiple Resource-Bounded Real-Time Agents

Caveats • Very biased march through these topics! • Emphasis of our work on the Cooperative Intelligent Real-Time Control Architecture (CIRCA) • Participants have included: Professor Kang Shin; former students Dave Musliner, Ella Atkins; current students Haksun Li, Dmitri Dolgov.

SOA 2 1 A 0 2 P B 0 2 Agent • Entity to which it is convenient to ascribe characteristics such as: • choices (capabilities, actions, plans, policies) • awareness (beliefs, knowledge) • preferences (goals, desires) • Typically: • declarative representations • taskable/adaptable • sociable

Agent State History Sensors • Decision cycle: • Percepts update beliefs • Beliefs, history, knowledge inform “state” • (Probablistic) projections of how actions/events/ plans in state change state • Evaluation of (desirability/utility of) possible future state(s) to select intentions • Intentions map to effector actions State Estimate Knowledge Environment Action/ Event Models Future State(s) Intentions Goals/ Preferences Effectors

Real-Time Agent • Some agent aspects are time-dependent: • Preferences change (e.g., goal deadlines) • Capabilities evolve (e.g., depletable resources) • Dynamic environment (e.g., outside events)

Better stop if the light turns red! Getting to airport important until flight has left! Get there before fuel runs out! Real-Time Agent

Real-Time Agent • Agent could be “coincidentally real-time” • Real-time agent explicitly reasons about time, deadlines, dynamics, etc. and knows what level of performance it can guarantee • Emphasis here: Dynamic environment • Real-time agent knows that it can preempt external events to pursue goals and avoid failure

Traffic Light Example • Preference: Avoid running red light • Action to take: Stop the car • Timing issues: Car requires time/distance to stop • Heralding event: Light turns yellow • Minimum time delay between yellow and red • Real-time agent must guarantee time it takes to notice yellow light (detect hazardous state) and stopping (take proper response) is less than minimum delay between yellow and red

p p0 t0 t Probability Rate Function • Specifies probability of transition (state-changing event) occurring during a time interval as a function of the time spent in the heralding state • Example: t0 is duration of yellow light, p0 is 1.

Other Probability Rate Functions

Preemptive Actions • There is uncertainty over transition times into states • Safest case is to assume a heralding state could happen at any time • Strategy: Monitor for state heralding periodically, frequently enough to leave time to take action before transition to failure state • Period is based on the minimum preemption time (between when a heralding state entered and earliest that failure transition could occur) minus the worst-case-execution-time for the action (monitoring plus response)

p p0 t0 t Action Probability Rate Function maxp • (t0 – maxp) is worst-case duration of the action • maxp is the maximum period for the periodically-scheduled action

Other Temporal Transitions • Events that change the agent’s state • New state isn’t a “failure” • Therefore, preemptive actions need not be guaranteed • However, some states might be more desirable than others (e.g., closer to some objective state), so actions might be considered when guaranteed actions don’t take their worst-case, so as to steer system in promising directions

CIRCA Architecture

Layered Architectures Planning/Deliberation Layer Reactive/Reflexive Layer Environment

Layered Architectures Social/Reflective Layer Planning/Deliberation Layer Dispatching/Sequencing Layer Reactive/Reflexive Layer Environment

Resource Bounded Agent in Real-Time Environment

Brake Failure Preemption

Control Schedule t min T t t t Schedule:

Resource-Bounded Real-Time Agent • What if it cannot schedule everything? • Does it “stretch” the periods of everything out until they fit, in a “best-effort” mode? • Does it selectively guarantee some by dropping others? • Our approach: Be selective • Our strategy: Drop actions that are least likely to be needed – those that preempt events that arise in states that are unlikely to be reached • Note: Other possibilities could include favoring preemptions for states to be reached sooner…

Computing State Probabilities Given a state transition matrix, the probability of going from a transient state to an absorbing state is given by the matrix: absorbing states transient states To compute the state probability for a state, we may have to transform it into an absorbing state by truncating all its outgoing transitions.

Transition Probabilities • We describe the probabilistic dynamics of the temporal and actions transitions by probability density/cumulative functions. • The transition probability of the i-th temporal transition is the probability that its firing time, Ti, equals the minimum of all firing times, min{T1, …, Tn}.

Making the Computation Tractable • To make this calculation tractable, we need to use approximation. • Model each transition (temporal or action) using the probability rate function over discrete intervals • Assume all transition rates are independent • For any time interval, probability of a transition will depend on other enabled transitions

From Unconditional Rate Functions To Conditional Rate Functions Time = 0 Time = 1 tt1 = 0.4, tt2 = 0.8, over all time intervals 0.2112 0.2112 Time = 2 Time = 3 0.6688 0.6688 0.2112 0.2112 0.6688 0.6688 Prate(none) Prate(none) Prate(none) The probability that nothing happens is (1-.4)(1-.8) = .12 The probability of “something happens” is 0.88.

From Conditional Rate Functions To Conditional Cumulative Probability Functions The maximum of the cumulative probability function is the transition probability. For previous case, Pcum(tt1)= .2112 + (.12).2112 + (.12)2 .2112 + … = .241 Pcum(tt2)= .6688 + (.12).6688 + (.12)2 .6688 + … = .759

Heuristic Justification • It is a numerical method approximation, but is accurate if the piecewise Markov assumption holds. • The heuristic agrees with what we would get should we use a continuous time model for constant rate functions, or for functions for which the piecewise Markov assumption holds. • Our empirical testing shows that even for functions for which the piecewise Markov assumption does not hold, the error in approximation decreases as one more finely discretizes the timeline into smaller intervals.

Employing Probability Thresholds Generate entire reachable state space and compute probabilities of states and actions/periods for all states Until actions are schedulable, interatively raise a pruning probability threshold, removing states (and associated actions) with lower probabilities Optionally, define expansion probability threshold to incrementally lower, to iteratively generate increasingly more complete schedules in “any-time” mode

Search Schematic I

Search Pseudocode while ((goal not found) OR (not all states with state prob. > expansion threshold expanded)) { find the state with the largest state probability which is not expanded; if (no state with state prob. > expansion threshold is found) { do a full and accurate state probability analysis; find again the state with the largest state probability; if (no state with state prob. > expansion threshold is found) { expansion is done; break the while loop; } } apply all temporal transitions to the state found; choose the best action for the state; estimate the state probabilities for its children; mark the unexpanded children for expansion; } If (pruning threshold > expansion threshold) { do a full and accurate state probability analysis; prune states/actions in ascending order of their state probabilities; } pass the set of TAPs to scheduler for scheduling;

What About Fault Adaptation? • Control cycle assures performance within an envelope of situations • System could exit the envelope: • Unmodelled transitions • Removed unlikely states that in fact occur • These could be because of internal events (faults) as well as external events

preempted failure state ttf Planned For States removed state deadend state context violation Goal Dynamic Planning Handled States

Adaptation • Design monitoring actions to detect when envelope has been left • Response could be to replan, or to dispatch a previously constructed contingency plan • Can guarantee response – if leaving the envelope heralds possible failure, can preempt by dispatching corrective schedule • Requires that fault detection/adaptation tasks must be scheduled alongside other tasks • More ready to adapt consumes more resources • Leaves less resources for achieving goals – reduces the set of situations to be prepared for • Tradeoffs can be explicitly reasoned about!

Multiple Resource-BoundedReal-Time Agents • Utilize the limited resources of each better through partitioning and assigning subsets of objectives to agents • A task allocation (negotiation) problem • Contracting • Auctioning • Even so, demands on agents could outstrip resources • Could individually prune unlikely states/actions • Could communicate to build better (smaller due to less uncertainty) state reachability graphs!

Agents with Models of Each Other • Agents can recognize situations that will trigger others to (re)act • Possible (re)actions are known non-locally • If actions are redundant, then resolve by converging on a consistent allocation • If others’ actions introduce uncertainty that significantly reduces expected performance of local plan, communicate to reduce uncertainty • If others’ actions lead to unsafe world trajectories, communicate to change actions

Convergence Protocol - Overview • In a multiagent setting, an agent may generate more actions than necessary to preempt failures if the agents plan independently. • An agent may prepare for contingencies associated with each of the actions another agent might take in a situation. • The other agent, however, will locally know which of those actions it will actually take. • Solution – Use a protocol to allow over-constrained agents to learn which contingencies are definitely not going to be caused by other agents, and so can be safely ignored.

acj ttac ttac aci I ttac ttac D acl ttac ack Convergence Protocol Example • Represent possible actions of other agents as temporal transitions (ttac labels). • Both agents may handle dangerous state D.

acj ttac ttac aci I ttac ttac D acl ttac ack Convergence Protocol Example Knowing which actions other agent plans can restrict which states this agent must worry about.

acj ttac ttac aci I ttac ttac D acl Convergence Protocol Example • Revelation of choices through protocol may eliminate entire subspace (and hence need to plan/schedule actions for those states). • Example of Load Reduction.

acj ttac ttac aci I ttac ttac D Convergence Protocol Example • Revelation of choices through protocol may eliminate a required action, by knowing other agent will handle the important event. • Choice of which agent to make commitment? Load balancing!! • Raises fault-tolerance issues.

Convergence Protocol - Description • At each iteration of protocol, an agent selects a state in which some other agent has a choice of actions, and asks that agent what it would execute in that state. • Ultimately, one or both agents might discover that the state cannot be reached anyway (due to other upstream choices). • Current research: allow inquiring agent to express a preference about others’ action choice, leading to fuller negotiation. • An agent can stop inquiring when it knows enough to successfully schedule actions for the reduced set of states. • Further reduction is possible but unnecessary. • Protocol assured to terminate after all states examined. • Order of inquiry can affect efficiency and composition of partial reductions, but not ultimate reduction (when agents converge on exactly the states that really will be reached). • Use heuristic to order states to inquire about.

Convergence Protocol Heuristics Strategies for selecting a state to inquire about: • Random Branching Point • Agent selects randomly from states with multiple action choices • Sequential Branching Point • Agent selects states in the order of their forward expansions during planning. Pruning “upstream” can have more impact. • Load Branching Point • Agent selects states in descending order of the number of actions that would be removed if the states are pruned. • Utilization Branching Point • Agent selects states in descending order of the marginal decrease in schedule utilization if the states are pruned.

Convergence ProtocolSome Sample Domains • 287 domains; 1626 agents • Each domain has from 2 to 10 agents, chosen randomly and uniformly. • Each KB has 7 private and public binary features combined. • The number of public features in a domain is random (from 0 – 7). It measures how tightly coupled the agents are. • Each agent has 15 private and public actions. • Each KB has 7 private and public temporal transitions that flip a random number of features.

Convergence ProtocolTAP Pruning • # of agents that are able to schedule for all their TAPs before running the protocol = 202 (12.42%). • # of agents that are able to schedule for all their TAPs after running the protocol = 704 (43.30%). • # of agents that become able to schedule for all their TAPs after running the protocol = 502 (30.87%).

Combined with Probabilistic Pruning • Protocol can greatly improve safety guarantees. • Even after using the protocol, however, some agents can be overconstrained. • Combine pruning with protocol with pruning the least likely actions to be needed. • Agents could also do this before the protocol finishes, in an “anytime” mode.

Convergence ProtocolAny-time Property Fi is probability of agent i reaching a failure state System failure probability: Simple 2 agents, 91 states each

Convergence ProtocolAny-time Property Gi is probability of agent i reaching a goal state System goal probability:

Convergence ProtocolAny-time Property Utility Function U = (1-F)aG(1-a) (here a=.9)

Scale Up Issues • Description so far has emphasized agents modeling what others might/will do • As number of agents grows, complexity of reasoning will too! • How to scale up: • Model less about each agent • Treat multiple agents as one • Model only (few) relevant agents • Strategies exist for all of these…

Summary • Described some specific ideas about guaranteeing real-time agent behavior • Emphasized challenges that arise when demands on agents outstrip their resources • Outlined how multiple agents can accomplish more by relying on each other to reduce load and to balance load

Load Balancing and Load Reduction for Multiple Resource-Bounded Real-Time Agents

Load Balancing and Load Reduction for Multiple Resource-Bounded Real-Time Agents

Presentation Transcript

Load Balancing Part 1: Dynamic Load Balancing

Load Balancing and Intelligent Load Balancing

Load Sharing and Balancing

Load Balancing

LOAD BALANCING SWITCH

LOAD BALANCING SWITCH

Load Balancing

Load balancing

Real Time load balancing of parallel application

Dynamic Load Balancing

Optimal Load-Balancing

Load Balancing

Self-Organizing Agents for Grid Load Balancing

Load balancing

Self-Organizing Agents for Grid Load Balancing

Load-Balancing

Load Balancing

LOAD BALANCING SWITCH

Clustering and Load Balancing

Load Balancing

Selfish Load Balancing