Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Computational Intro:Conservation and BiodiversityWildlife Corridor Design Carla P. Gomes Joint work with Jon Conrad, Bistra Dilkina, Willem van Hoeve, Ashish Sabharwal, and Jordan Sutter Topics in Computational Sustainability Spring 2010

Outline • Wildlife corridor design problem • Problem Definition • How hard is it to solve it? • Concepts of Problem Complexity • How to model it? • Mixed Integer Programming formulation and other issues • How to solve it? • How to scale up solutions? • Experimental Results • Research Questions

Problem Definition

Conservation and Biodiversity :Wildlife Corridors Wildlife Corridors Preserve wildlife against land fragmentation Link core biological areas, allowing animal movement between areas. Limited budget; must maximize environmental benefits/utility New York Times (Science) 2006

Conservation and Biodiversity :Grizzly Bear Wildlife Corridors Wildlife Corridorslink core biological areas, allowing animal movement between areas. Typically: low budgets to implement corridors. Example: • Goal: preserve grizzly bear populations in • the U.S. Northern Rockies by creating wildlife • corridors connecting 3 reserves: • Yellowstone National Park; • Glacier Park and • Salmon-Selway Ecosystem

Grizzly Bear Corridor inNorthern Rockies Habitat Suitability can be a challenging Machine Learning problem Cost Real world instance: Corridor for grizzly bears in the Northern Rockies, connecting: Yellowstone Salmon-Selway Ecosystem Glacier Park Study area ~ 320,000 sq km

Wildlife Corridor Design:Problem Definition (Informal English Definition ) Reserve Land parcel • Instance: • A set of parcels and their neighborhood relationships • A set of reserves or terminals (subset of the parcels) • The cost and the utility (habitat suitability) per parcel • Question: • What is the set of connected parcels, containing the reserves, maximizing the utility, such that the total cost does not exceed a given budget C? Cost and utility info omitted

Example utility cost Budget 11 Budget 10 Cost = 10;Utility = 9 Cost = 11;Utility = 10

Example utility cost Min Cost solution Budget 11 Budget 10 Cost = 7;Utility = 5 Cost = 10;Utility = 9 Cost = 11;Utility = 10

Wildlife Corridor Design: (Graph Representation) Undirected Graph Representation G=(V,E) Reserve Land parcel • Input: • A set of parcels and their neighborhood relationship • A set of reserves or terminals (subset of the parcels) • The cost and the utility (habitat suitability) per parcel • Output: • A set of connected parcels, containing the reserves maximizing the utility, such that the total cost does not exceed a given budget C Cost and utility info omitted in the pictures

The Connection Subgraph Problem(Optimization Version) 11 Cost optimization version : given U, minimize cost Instance • An undirected graph G = (V,E) • Terminal vertices T V • Vertex cost function: c(v); utility function: u(v) • Cost bound / budget C; Question What’s the subgraph H of G with maximum utility such that • H is connected and contains T • cost(H)  C? Utility optimization version : given C, maximize utility

The Connection Subgraph Problem(Decision Version) 12 Instance • An undirected graph G = (V,E) • Terminal vertices T V • Vertex cost function: c(v); utility function: u(v) • Cost bound / budget C; desired utility U Question Is there a subgraph H of G such that • H is connected and contains T • cost(H)  C; utility(H)  U ?

Connection Subgraph: other possible applications Social networks • What characterizes the connection between two individuals? The shortest path? Size of the connected component? A “good” connected subgraph? • If a person is infected with a disease, who else is likely to be? • Which people have unexpected ties to any members of a list of other individuals? • Vertices in graph: people; edges: know each other or not [Faloutsos, McCurley, Tompkins ’04] Project: Find other applications of the connection graph problem and variants and apply/extend ideas presented in this lecture.

Concepts of Problem Complexity:Easy vs. hard problems

How hard (complex) is it to solve the connection sub-graph problem? Before answering this question…

How do computer scientists differentiate between good (efficient) and bad (not efficient) algorithms The yardstick is that any algorithm that runs in no more than polynomial time is an efficient algorithm; everything else is not.

Ordered functions by their growth rates c Order constant 1 logarithmic 2 polylogarithmic 3 lg n nr ,0<r<1 n sublinear 4 lgc n linear 5 nr ,1<r<2 subquadratic 6 n2 quadratic 7 n3 cubic 8 nc,c≥1 rn, r>1 polynomial 9 Efficient algorithms exponential 10 Not efficient algorithms

Roughly Speaking… exponential quadratic Cost (run time) linear logarithmic constant Size of instance N

Polynomial vs. exponential growth (Harel 2000) Binary B&B alg. exponential polynomial LP’s interior point Min. Cost Flow Algs Transportation Alg Assignment Alg Dijkstra’s alg. N2

How can we show a problem is efficiently solvable? • We can show it constructively. We provide an algorithm and show that it solves the problem efficiently. E.g.: • Shortest path problem - Dijkstra’s algorithm runs in polynomial time. Therefore the shortest path problem can be solved efficiently. • Linear Programming – The Interior Point method has polynomial worst-case complexity. Therefore Linear programming can be solved efficiently. (*) The simplex method has exponential worst case complexity/ However, in practice the simplex algorithm seems to scale as m3, where m is the number of functional constraints.

How can we show a problem is not efficiently solvable? • How do you prove a negative? Much harder!!! • This is the aim of complexity theory.

Easy (efficiently solvable) problems vsHard Problems • Easy Problems - we consider a problem X to be“easy” or efficiently solvable, if there is a polynomial time algorithm A for solving X. We denote by P the class of problems solvable in polynomial time. • Hard problems --- everything else. Any problem for which there is no polynomial time algorithm is an intractable problem.

Satisfiability Start Goal (A or B) (D or E or notA) EXPONENTIAL FUNCTION POLYNOMIAL FUNCTION Experiment Design EXPLOSIVE COMBINATORICS EXPONENTIAL-TIME ALGORITHMS Fiber optics routing Hard Computational Problems Scale Exponentially In the worst case Tackling practical size instances requires powerful computational and mathematical tools! NP-Complete and NP-Hard Problems Planning and Scheduling And Supply Chain Management Data Analysis & Data Mining Protein Folding And Medical Applications Capital Budgeting And Financial Appl. Combinatorial Auctions Information Retrieval Software & Hardware Verification Many more applications!!!

How hard (complex) is the connection subgraph problem? Unfortunately that means we don’t know of good, efficient (polynomial time) algorithms to solve this problem. We believe the connection subgraph problem is intractable: Computer scientists only know of exponential time algorithms to solve it (and computer scientists strongly believe that no polynomial time algorithm will ever be found, but there is no prove either way) The connection subgraphproblem is NP-Hard. Connections in networks: Hardness of feasibility versus optimality. Conrad, J., C. Gomes, W.-J. van Hoeve, A. Sabharwal, and J. Suter. Proc. CPAIOR 07, 2007 pages 16–28.

Should we give up on finding good solutions? Worst Case Result! Real-world problems are not necessarily worst case and they possess hidden sub-structure that can be exploited allowing scaling up of solutions. The connection subgraph problem is NP-Hard! Connections in networks: Hardness of feasibility versus optimality. Conrad, J., C. Gomes, W.-J. van Hoeve, A. Sabharwal, and J. Suter. Proc. CPAIOR 07, 2007 pages 16–28.

Encoding the connection subgraph problem as a Mixed Integer Programming Problem

Single commodity Flow Encoding 1 1 1 5 6 3 Max Flow = 9 Root (r) 2 1 1 1 1 1 • Variables: xi , binary variable, for each vertex i ( 1 if included in corridor ; 0 otherwise) Yij, continuous variable for each edge flow ij • Cost constraint: i cixi  C • Utility optimization function: maximize i uixi • Connectedness: use a single commodity flow encoding

Single Commodity Flow: MIP • Max utility • Budget constraint • Reserves • Total flow • Flow balance • Incoming edges allowed only if selected This is what makes the problem hard ≤ Note: E’ is the set of directed edges, obtained from replacing each undirected edge of E with two directed edges.

Solving the Mixed Integer Programming Encoding • Cplex – state of the art MIP solver • Branch and Bound • LP relaxation • Cut generation connectionsubgraphinstance MIPmodel CPLEX solution feasibility + optimization

Experimental Results

Synthetic Instances for Evaluation Problem evaluated on semi-structured graphs • m x m lattice / grid graph with k terminals • Inspired by the conservation corridors problem • Place a terminal each on top-left and bottom-right • Maximizes grid use • Place remaining terminals randomly • Assign uniform random costs and utilitiesfrom {0, 1, …, 10} m = 4 k = 4

10 x 10 8 x 8 Runtime (logscale) 0.01 1 100 10000 6 x 6 0 0.2 0.4 0.6 0.8 Budget fraction Standard MIPResults: without terminals • No terminals  “find the connected component that maximizes the utility within the given budget” • Pure optimization problem; always feasible • Still NP-hard A clear easy-hard-easypattern with uniformrandom costs & utilities Note 1: plot in log-scale for betterviewing of the sharp transitions Note 2: each data point is medianover 100+ random instances

Standard MIP:3 terminals (feasibility vs. optimization) Split instances into feasible and infeasible; plot median runtime • For feasible ones : computation involves proving optimality • For infeasible ones: computation involves proving infeasibility Infeasible instances take much longer than the feasible ones!

Results: with terminals connectionsubgraphinstance Problem? • MIP+Cplex really weak at feasibility testing • Poor scaling: couldn’t even get close to handling real data Can we do better? MIPmodel CPLEX solution feasibility + optimization Ashish Sabharwal CP-AI-OR '08

A Related Problem (ignoring utilities):Minimum Cost solution - The Steiner Tree Problem 35 If the edge costs are all positive, then the resulting subgraph is obviously a tree. Input • An undirected graph G = (V,E) • Terminal vertices T V • Edge cost function: c(e); Question What’s the subgraph H of G with minimum cost such that • H is connected and contains T?

The Steiner Tree Problem:Min cost tree connecting the terminals Also NP-Hard but • When we only have two terminals  shortest path (e.g., Dijkstra algorithm or algorithm based on dynamic programming) • Bounded number of terminals • Fixed parameter tractable algorithm

The Steiner Tree Problem:Min cost tree connecting the terminals Three terminals (as in the case of our grizzly bear problem) • Algorithm ---in order to connect the three terminals - find where to place the root of the tree  compute all pairs shortest paths (easy algorithm based on dynamic programming or even Dijkstra’s) • Algorithm also used for the starting point of a greedy solution – start with the minimum cost corridor and extend it greedily by picking the nodes with decreasing util/cost ratio to use the remaining budget • Algorithm also used for pruning (nodes that are too far away and connecting them to the terminals is beyond the budget can be pruned)

Solving the connection subgraph problem: Two Phase Approach • 1st Phase – compute the minimum Steiner tree based algorithm and produces a greedy solution This phase runs in polynomial time for a constant number of terminal nodes. • 2nd Phase - Refines the greedy solution to produce an optimal solution with Cplex

Solving the connection subgraph problem: Phase ! • 1st Phase – compute the minimum Steiner tree based algorithm • Produces the minimum cost solution • Produces shortest path information used for pruning the serach space - the all-pairs-shortest-paths matrix • Produces a greedy (and often sub-optimal) solution for feasible instances (highest util/cost ratio parcels are selected to use the remaining budget) This phase runs in polynomial time for a constant number of terminal nodes.

Solving the connection subgraph problem: Phase II • Refines the greedy solution to produce an optimal solution with Cplex • Greedy solution is passed to Cplex as the starting solution (Cplex can change it). • The all-pairs-shortest-paths matrix computed in Phase I is also passed on to Phase II. It is used to statically (i.e., at the beginning) prune away all nodes that are easily deduced to be too far to be part of a solution (e.g., if the minimum Steiner tree containing that node and all of the terminal vertices already exceeds the budget). This significantly reduces the search space size, often in the range of 40-60%. • Computes an optimal solution (or the optimal extended-mincost solution) to the utility-maximization version of the connection subgraph problem.

min-cost solution compute min-cost Steiner tree ignore utilities APSPmatrix greedily extendmin-cost solutionto fill budget 0 3 6 2 8 3 0 7 4 1 6 7 0 5 9 2 4 5 0 1 8 1 9 1 0 “like” knapsack: max u/c dynamicpruning 40-60%pruned higher utilityfeasible solution starting solution Solving the Connection Sub-Graph Problem:Exploiting Structure (A Hybrid MIP/CP Approach) connectionsubgraphinstance MIPmodel CPLEX solution optimization feasibility Conrad, G., van Hoeve, Sabharwal, Sutter 2008

10x10 random lattices, 3 reserves Infeasible instancessolved instantaneously! ~20x improvementin runtime onfeasible instances

10x10 random lattices, 3 reserves Gap between optimaland extended-optimalsolutions Peak of hardnessstill stronglycorrelated withbudget slack

Experimental Results: Yellowstone case

Grizzly Bear Corridor inNorthern Rockies Habitat Suitability can be a challenging Machine Learning problem Cost Real world instance: Corridor for grizzly bears in the Northern Rockies, connecting: Yellowstone Salmon-Selway Ecosystem Glacier Park Study area ~ 320,000 sq km

Min Cost Solution for Different Granularities

Real Data, 50x50km Parcels Gap between optimaland extended-optimalsolutions peaks in acritical region rightafter min-cost 50x50km Parcels

Real Data, 40x40km Parcels Gap between optimaland extended-optimalsolutions peaks in acritical region rightafter min-cost 40x40km Parcels

Computational Intro: Conservation and Biodiversity Wildlife Corridor Design