UAV Route Planning in Delay Tolerant Networks

UAV Route Planning in Delay Tolerant Networks Daniel Henkel, Timothy X Brown University of Colorado, Boulder Infotech @ Aerospace ‘07 May 8, 2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

Familiar: Dial-A-Ride Dial-A-Ride:curb-to-curb, shared ride transportation service • Receive calls • Pick up and drop off passengers • Minimize overall transit time The Bus Optimal route not trivial !

In context: Dial-A-UAV Complication: infinite data at sensors; potentially two-way traffic Delay tolerant traffic! Talk tomorrow – 8am: Sensor Data Collection Sensor-1 Sensor-3 Sensor-5 Monitoring Station Sensor-2 Sensor-6 Sensor-4 • Sparsely distributed sensors, limited radios • TSP solution not optimal • Our approach: Queueing and MDP theory

TSP’s Problem Traveling Salesman Solution • One cycle visits every node • Problem: far-away nodes with little data to send • Visit them less often A B UAV hub pA pB dA dB fA fB B New: cycle defined by visit frequenciespi B

Queueing Approach Goal Minimize average delay Idea: express delay in terms of pi, then minimize over set {pi} • pi as probability distribution • Expected service time of any packet • Inter-service time: exponential distribution with mean Ti/pi • Weighted delay: A B UAV fB fA pA pB dB dA pC C hub pD dC dD D fC fD

Solution and Algorithm Probability of choosing node i for next visit: Implementation: deterministic algorithm 1. Set ci= 0 2. ci = ci + pi while max{ci} < 1 3. k = argmax {ci} 4. Visit node k; ck = ck-1 5. Go to 2. Performance improvement over TSP!

Unknown Environment • What is RL? • Learning what to do without prior training • Given: high-level goal; NOT: how to reach it • Improving actions on the go • Distinguishing Features: • Interaction with environment • Trial & Error Search • Concept of Rewards & Punishments • Example: training dog Learns model of environment.

The Framework Agent • Performs Actions Environment • Gives rise to Rewards • Puts Agent in situations called States

Elements of RL Policy Reward Value Model ofEnvironment • Policy: what to do (depending on state) • Reward: what is good • Value: what is good because it predicts reward • Model: what follows what Source: Sutton, Barto, Reinforcement Learning – An Introduction, MIT Press, 1998

UA Path Planning - Simple Goal Minimize average delay -> Find pA and pB • Service traffic from A and B to hub H • Goal: minimize average packet delay • State: traffic waiting at nodes: (tA, tB) • Actions: fly to A; fly to B • Reward: # packets delivered • Optimal policy: # visits to A and B; depend on flow rates, distances A B UAV hub pA pB dA dB fA fB

MDP • If a reinforcement learning task has the Markov Property, it is basically a Markov Decision Process (MDP). • If state and action sets are finite, it is a finite MDP. • To define a finite MDP, you need to give: • state and action sets • one-step “dynamics” defined by transition probabilities: • reward expectation:

RL approach to solving MDPs • Policy: Mapping from set of States to set of Actions π : S → A • Sum of Rewards (:=return): from this time onwards • Value function (of a state): Expected return when starting with s and following policy π. For an MDP,

Bellman Equation for Policy π • Evaluating E{.}; assuming deterministic policy; π solution: • Action-Value Function: Value of taking action a in state s. For an MDP,

Optimality • V and Q, both have a partial ordering on them since they are real valued. π also ordered: • Concept of V* and Q*: • Concept of π*: The policy π which maximizes Qπ(s,a) for all states s.

Reinforcement Learning - Methods • To find π*, all methods try to evaluate V/Q value functions • Different Approaches: • Dynamic Programming Approach • Policy evaluation, improvement, iteration • Monte-Carlo Methods • Decisions are taken based on averaging sample returns • Temporal Difference Methods (!!)

UAV Route Planning in Delay Tolerant Networks

UAV Route Planning in Delay Tolerant Networks

Presentation Transcript

“Data over Time” Delay Tolerant Networks

Delay Tolerant Networking

DTNs Delay Tolerant Networks

Anycast in Delay Tolerant Networks

Scalable Routing in Delay Tolerant Mobile Networks

Delay Tolerant and Opportunistic Networks

Delay Tolerant Network

Delay-tolerant networking

Routing In Socially Selfish Delay Tolerant Networks

Voice Communication in the Delay-Tolerant Networks

Multicasting in delay tolerant networks a social network perspective networks

Scalable Routing in Delay Tolerant Mobile Networks

Delay Tolerant Networks (DTN)

Transportation-aware Routing in Delay Tolerant Networks (DTNs)

Scalable Routing In Delay Tolerant Networks

Delay Tolerant Networking

Delay-Tolerant Networks

Delay Tolerant Networks

Delay-Tolerant Mobile Ad-hoc Networks

Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks

Anycast in Delay Tolerant Networks