Approximation Algorithms for Orienteering and Discounted-Reward TSP

Approximation Algorithms for Orienteering and Discounted-Reward TSP Blum, Chawla, Karger, Lane, Meyerson, Minkoff CS 599: Sequential Decision Making in Robotics University of Southern California Spring 2011

TSP: Traveling Salesperson Problem • Graph V, E • Find a tour (path) of shortest length that visits each vertex in V exactly once • Corresponding decision problem • Given a tour of length L decide whether a tour of length less than L exists • NP-complete • Highly likely that the worst case running time of any algorithm for TSP will be exponential in |V|

Robot Navigation • Can’t go everywhere, limits on resources • Many practical tasks don’t require completeness but do require immediacy or at least some notion of timeliness/urgency (e.g. some vertices are short-lived and need to get to them quickly)

Prizes, Quotas and Penalties • Prize Collecting Traveling Salesperson Problem (PCTSP) • A known prize (reward)available at each vertex • Quota: The total prize to be collected on the tour (given) • Not visiting a vertexincurs a known penalty • Minimize the total travel distance plus the total penalty, while starting from a given vertex and collecting the pre-specified quota • Best algorithm is a 2 approximation • Quota TSP • All penalties are set to zero • Special case is k-TSP, in which all prizes are 1 (k is the quota) • k-TSP is strongly tiedto the problem of finding a tree of minimum cost spanning any k vertices in a graph, called the k-MST problem • Penalty TSP:no required quota, only penalties • All these admit a budget version where a budget is given as input and the goal is to find the largest k-TSP (or other) whose cost is no more than the budget

Orienteering • Orienteering: Tour with maximum possible reward whose length is less than a pre-specified budget B orienteering |ˌôriənˈti(ə)ri NG |noun a competitive sport in which participants find their way to various checkpoints across rough country with the aid of a map and compass, the winner being the one with the lowest elapsed time. ORIGIN 1940s: from Swedish orientering.

Approximating Orienteering • Any algorithm for PC-TSP extends to unrooted Orienteering • Thus best solution for unrooted Orienteering is at worst a 2 approximation • No previous algorithm for constant factor approximation of rooted Orienteering

Discounted-Reward TSP • Undirected weighted graph • Edge weights represent transit time over the edge • Prize (reward) on vertex v • Find a path visiting each vertex at time that maximizes

Discounting and MDPs • Encourages early reward collection, important if conditions might change suddenly • Optimal strategy is a policy (a mapping from states to action) • Markov decision process • Goal is to maximize the expected total discounted reward (can be solved in polynomial time) in a stochastic action setting • Can visit states multiple times • Discounted-Reward TSP • Visit a state only once (reward available only on first visit) • Deterministic actions

Overall Strategy • Approximate the optimum difference between the length of a prize-collecting path and the length of the shortest path between its endpoints • Paper gives • An algorithm that provably gives a constant factor approximation for this difference • A formula for the approximation • The results mean that constant factor approximations exist (and can be computed) for Orienteering and Discounted-Reward TSP

Path Excess • Excess of a path P from s to t: • Minimum excess path of total prize is also the minimum cost path of total prize • An (s,t) path approximating optimal excess by factor will have length (by definition) • Thus a path that approximates min excess by will also approximate minimum cost path by

Results First letter is objective (cost, prize, excess, or discounted prize) and second is the structure (path, cycle, or tree)

Min Excess Algorithm • Let P* be shortest path from s to t with • Let • Min-excess algorithm returns a path P of length with where

Orienteering Algorithm • Compute maximum-prize path of length at most D starting at vertex s • Perform a binary search over (prize) values k • For each vertex v, compute min-excess path from s to v collecting prize k • Find the maximum k such that there exists a v where the min-excess path returned has length at most D; return this value of k (the prize) and the corresponding path

Discounted-Reward TSP Algorithm • Re-scale all edge length so • Replace each prize by the prize discounted by the shortest path to that node • Call this modified graph G’ • Guess t – the last node on optimal path P* with excess less than • Guess k – the value of • Apply min-excess approximation algorithm to find a path P collecting scaled prize k with small excess • Return this path as solution

Results First letter is objective (cost, prize, excess, or discounted prize) and second is the structure (path, cycle, or tree)

Approximation Algorithms for Orienteering and Discounted-Reward TSP