Case Studies: Bin Packing & The Traveling Salesman Problem

Case Studies: Bin Packing &The Traveling Salesman Problem Part II David S. Johnson AT&T Labs – Research

The Traveling Salesman Problem Given: Set of cities {c1,c2,…,cN }. For each pair of cities {ci,cj}, a distance d(ci,cj). Find: Permutation that minimizes

N = 10

Other Types of Instances • X-ray crystallography • Cities: orientations of a crystal • Distances: time for motors to rotate the crystal from one orientation to the other • High-definition video compression • Cities: binary vectors of length 64 identifying the summands for a particular function • Distances: Hamming distance (the number of terms that need to be added/subtracted to get the next sum)

No-Wait Flowshop Scheduling • Cities: Length-4 vectors <c1,c2,c3,c4> of integer task lengths for a given job that consists of tasks that require 4 processors that must be used in order, where the task on processor i+1 must start as soon as the task on processor i is done). • Distances: d(c,c’) = Increase in the finish time of the 4th processor if c’ is run immediately after c. • Note: Not necessarily symmetric: may have d(c,c’)  d(c’,c).

How Hard? • NP-Hard for all the above applications and many more • [Karp, 1972] • [Papadimitriou & Steiglitz, 1976] • [Garey, Graham, & Johnson, 1976] • [Papadimitriou & Kanellakis, 1978] • …

How Hard? Number of possible tours: N! = 123(N-1)N = Θ(2NlogN) 10! = 3,628,200 20! ~ 2.431018 (2.43 quadrillion) Dynamic Programming Solution: O(N22N) = o(2NlogN)

Dynamic Programming Algorithm • For each subset C’ of the cities containing c1, and each city cC’, let f(C’,c) = Length of shortest path that is a permutation of C’, starting at c1 and ending at c. • f({c1}, c1) = 0 • For xC’,f(C’{x},x) = MincC’f(C’,c) + d(c,x). • Optimal tour length = MincCf(C,c) + d(c, c1). • Running time: ~(N-1)2N-1 itemsto be computed, at time N for each = O(N22N)

How Hard? Number of possible tours: N! = 123(N-1)N = Θ(2NlogN) 10! = 3,628,200 20! ~ 2.431018 (2.43 quadrillion) Dynamic Programming Solution: O(N22N) 102210 = 102,400 202220 = 419,430,400

N = 10

N = 100

N = 1000

N = 10000

Planar Euclidean Application #1 • Cities: • Holes to be drilled in printed circuit boards

N = 2392

Planar Euclidean Application #2 • Cities: • Wires to be cut in a “Laser Logic” programmable circuit

N = 7397

N = 33,810

N = 85,900

Standard Approach to Coping with NP-Hardness: • Approximation Algorithms • Run quickly (polynomial-time for theory, low-order polynomial time for practice) • Obtain solutions that are guaranteed to be close to optimal • For the latter and the TSP, we need the triangle inequality to hold: d(a,c) ≤ d(a,b) + d(b,c)

No -Inequality Danger • Theorem[Karp, 1972]: Given a graph G = (V,E),it is NP-hard to determine whether G contains a Hamiltonian circuit (collections of edges that make up a tour). • Given a graph, construct a TSP instance in which d(c,c’) = • If Hamiltonian circuit exists, OPT = N, if not, OPT > N2N. • A polynomial-time approximation algorithm with that guaranteed a tour of length no more than 2N OPT would imply P = NP. { 1 if {c,c’}  E N2N if {c,c’}  E

Nearest Neighbor (NN): Start with some city. Repeatedly go next to the nearest unvisited neighbor of the last city added. When all cities have been added, go from the last back to the first.

A d(A,D) ≤ d(A,E) + d(E,D) B E d(B,C) ≤ d(B,E) + d(E,C) D C Note: By -inequality, an optimal tour need not contain any crossed edges. d(A,D) + d(B,C) ≤ (d(A,E) + d(E,D)) + (d(B,E) + d(E,C)) = (d(A,E) + d(E,C)) + (d(B,E) + d(E,D)) = d(A,C) + d(B,D) For the Euclidean metric, the inequalities are strict (unless all relevant cities are co-linear)

Theorem [Rosenkrantz, Stearns, & Lewis, 1977]: • There exists a constant c, such that if instance I obeys the triangle inequality, then we have NN(I) ≤ clog(N)Opt(I). • There exists a constant c’, such that for all N > 3, there are N-city instances I obeying the triangle inequality for which we have NN(I) > c’log(N)Opt(I). • For any algorithm A, let RN(A) = min{A(I)/OPT(I): I is an N-city instance} • ThenRN(NN) = Θ(log(N)) (worst-case ratio)

Lower Bound Examples 1 F1: (NN starts at left, ends in middle) 1+ 1 Li/2 + 1 Li/2 + 1 Fi+1: Fi Fi 1+ 1+ Let Li be the number of edges encountered if we travel step-by-step from the leftmost vertex in Fi to the rightmost. L1 = 2, Li+1 = 2Li + 2

Set the distance from the rightmost vertex of Fkto the leftmost to 1+, and set all non-specified distances to their -inequality minimum. OPT(Fk)/(1+) ≤ N = Lk + 1 = (2Lk-1 + 1) + 1 = (2(2Lk-2 + 1) + 1) + 1 = (2(2(2Lk-3 + 1) + 1) + 1) + 1 = = 2k + 2k – 1 < 2k+1 Log(N) < k+1 NN(Fk) ≥ Lk + 1 + > 2k + = 2k + (k-1)2k-1 > k2k-1 > (k/4) OPT(Fk)/(1+) = Ω(log(N) OPT(Fk)

Upper Bound Proof (Sketch) • Observation 1: For symmetric instances with the -inequality, d(c,c’) ≤ OPT(I)/2 for all pairs of cities {c,c’}. P1 c Optimal Tour P2 c’ -inequality  d(c,c’) ≤ length(P1) and d(c,c’) ≤ length(P2) 2d(c,c’) ≤ OPT(I)

Observation 2: For cities c, let L(c) be the length of edge added when c was the last city in the tour. For all pairs {c,c’}, d(c,c’) ≥ min(L(c),L(c’)). • Proof: Suppose without loss of generality that c is the first of c,c’ to occur in the NN tour. When c was the last city, the next city chosen, say c*, satisfied d(c,c*) ≤ c(c,c’’) for all unvisited cities c’’. Since c’ was as yet unvisited, this implies c(c,c’)≥ d(c,c*) = L(c). • Overall Proof Idea: We will partition C into O(logN) disjoint sets Ci such that cCiL(c) ≤ OPT(I), implying that NN(I) = cCL(c) = O(logN)OPT(I).

Set X0 = C, i = 0 and repeat while |Xi| > 3. • Let Ti be the set of edges in an optimal tour for Xi -- note that |Ti| = |Xi|.By the -inequality, the total length of these edges is at most OPT(I). • Let Ti’ be the set of edges in Ti with length greater than 2OPT(I)/|Ti| -- note that |Ti’| < |Ti|/2. • For each edge e = {c,c’} in Ti - Ti’,let f(e) be a city c’’  {c,c’} such that L(c’’) ≤ d(c,c’). • Set Ci= {f(e): e  Ti - Ti’} -- note that • cCiL(c) ≤ eTilength(e) ≤ OPT(I). • |Ci| ≥ |Ti - Ti’|/2 ≥ |Ti|/4 = |Xi|/4 . • Set Xi+1 = Xi – Ci -- note that |Xi+1| ≤ (3/4)|Xi|.

Given that |Xi+1| ≤ (3/4)|Xi|, i ≥ 0, this process can only continue while (3/4)iN > 3, i.e., by the time log(3/4)●i + logN > log(3), or roughly while log(N)-log(3) > -log(3/4)●i = log(4/3)●i. • Hence the process halts at iteration i*, where • At this point there are 3 or fewer cities in Xi*+1. Put each of them in a separate set Ci*+j, 1 ≤ j ≤ |Xi*+1|, for which, by Observation 1, we will have L(c) ≤ OPT(I)/2.

Greedy (Multi-Fragment) (GR): Sort the edges, shortest first, and treat them in that order. Start with an empty graph. While the current graph is not a TSP tour, attempt to add the next shortest edge. If it yields a vertex degree exceeding 2 or a tour of length less than N, delete.

Theorem [Ong & Moore, 1984]: For all instances I obeying the -Inequality,GR(I) = O(logN)OPT(I). • Theorem [Frieze, 1979]: There are N-city instances IN for arbitrarily large N that obey the-Inequality and have GR[IN ] = Ω(logN/loglogN).

Nearest Insertion (NI): Start with 2-city tour consisting of the some city and its nearest neighbor. Repeatedly insert the non-tour city that is closest to a tour city (in the location that yields the smallest increase in tour length).

Theorem [Rosenkrantz, Stearns, & Lewis, 1977]: If instance I obeys the triangle inequality, then R(NI) = 2. • Key Ideas of Upper Bound Proof: • The minimum spanning tree (MST) for I is no longer than the optimal tour, since deleting an edge from the tour yields a spanning tree. • The length of the NI tour is at most twice the length of an MST. A d(B,C) ≤ d(B,A) + d(A,C) by -inequality, so d(A,C) + d(B,C) – d(B,A) ≤ 2d(A,C) (A,C) would be the next edge added by Prim’s minimum spanning tree algorithm C B

Lower Bound Examples 2 2 2 2 2 2 2 NI(I) = 2N-2 2 OPT(I) = N+1

Another Approach • Observation 1: Any connected graph in which every vertex has even degree contains an “Euler Tour” – a cycle that traverses each edge exactly once, which can be found in linear time. [Problem 2: Prove this!] • Observation 2: If the -inequality holds, then traversing an Euler tour but skipping past previously-visited vertices yields a Traveling Salesman tour of no greater length.

Obtaining the Initial Graph • Double MST algorithm (DMST): • Combine two copies of an MST. • Theorem [Folklore]: DMST(I) ≤ 2Opt(I). • Christofides algorithm (CH): • Combine one copy of an MST with a minimum-length matching on its odd-degree vertices (there must be an even number of them since the total sum of degrees for any graph is even). • Theorem[Christofides, 1976]: CH(I) ≤ 1.5Opt(I).

Optimal Tour on Odd-Degree Vertices (No longer than overall Optimal Tour) Matching M1 + Matching M2 = Optimal Tour Hence Optimal Matching≤ min(M1,M2) ≤ OPT(I)/2

Can we do better? • No polynomial-time algorithm is known that has a worst-case ratio less than 3/2 for arbitrary instances satisfying the -inequality. • Assuming P  NP, no polynomial-time algorithm can do better than 220/219 = 1.004566… [Papadimitriou & Vempala, 2006]. • For Euclidean and related metrics, there exists a polynomial-time approximation scheme (PTAS): For each  > 0, a polynomial-time algorithm with worst-case ratio ≤ 1+. [Arora, 1998][Mitchell, 1999].

PTAS RunningTimes • [Arora, STOC 1996]: O(N100/) • [Arora, JACM 1998]: O(N(logN)O(1/)) • [Rao & Smith, STOC 1998]: O(2(1/)O(1)N+ N log N) • If  = ½ and O(1) = 10, then 2(1/)O(1)> 21000.

Performance “In Practice” • Testbed: Random Euclidean Instances (Results appear to be reasonably well-correlated with those for our real-world instances.)

Nearest Neighbor

Greedy

Case Studies: Bin Packing & The Traveling Salesman Problem