300 likes | 447 Vues
CS6604 P ROJECT. E XAMINATION O F G RAPH P ARTITIONING S TORAGE METHODS F OR R OAD N ETWORKS Presenter: Andrew Connors Professor: Prof. Chang-Tien Lu. Abstract. Looking at networks with moving objects, in particular Road Networks
E N D
CS6604 PROJECT EXAMINATION OF GRAPH PARTITIONING STORAGE METHODS FOR ROAD NETWORKS Presenter: Andrew Connors Professor: Prof. Chang-Tien Lu
Abstract • Looking at networks with moving objects, in particular Road Networks • Use Graph Partitioning Scheme, but instead of basin on connectivity, base partitioning on road usage • Results in partitions that group nodes and edges based on most probable routes • Use these partitions to cluster data and use scheme similar to CCAM • Look at improvements in algorithms that track the user – as the system follows the user then nodes should be in loaded memory pages – reduce paging
Approach • Take 2005 data from MnDOT who publish Average Annual Daily Traffic (AADT) for there entire network • Has 239166 nodes and 193096 edges • Use multilevel k-way partitioning schema • based on METIS schema of George Karypis • Adapt partitioning objective to be based on local probabilities • Use clustering based storage • Based on CCAM – but change from connectivity to probability • Instead of WCRR measure, use percentage page-misses – more direct measure
Graph Partitioning • Is NP-Complete – so need heuristics • Multilevel partitioning involves these steps: • Graph coarsening phase – reduces number of nodes by collecting adjacent nodes and replace with aggregate node – use edge-matching – find maximal set of matching edges and collapsing vertices incident on each edge into one vertex • Initial partitioning phase – now this is done on a significantly smaller graph – use a recursive bisection – splitting in two balanced halves each time – this is where the real work is done – based on a heuristic and cost model • Un-coarsening phase – recover the original graph but keeping the partitioning intact
Coarsening Phase • Coarsening is a well studied mechanism • Must be careful that it does not result in a sub-optimal partition • Solution is to ensure • No loss of information that is necessary for the partitioning algorithm • Noise is not added on un-coarsening • Do not reduce graph size too much – costs to much in performance and could result in too much information loss • Use edge-matching algorithm
Edge Matching • Take an original Graph V0=V(V0,E0) with vertices V0 and edges E0 • On each step i, construct smaller a smaller graphs Gi=V(Vi,Ei) with vertices Vi and edges Ei where |Vi| > |Vi+1|
Initial Graph Partitioning • Partitioning is NP-Complete and so need a good heuristic for performance • The Heuristic needs to achieve • Balancing and constraining the partition sizes • A cost functioning that when minimized (or maximized) results in the required partitions • Try to incorporate ratio-cut – which puts the partition sizes into cost – so can naturally achieve partitions with using constraints • Use recursive bisection – taking repeated bisections.
Un-coarsening Phase • Run through back through graphs from coarsest to finest Gm-1, Gm-2,…..G0 • Assign nodes that were collapsed together in coarser graph to same partition in finer graph • However, may not have local minima at each step due to increase in degrees of freedom • Therefore, need refinement algorithm on each step
Clustered Storage CCAM • Use same mechanism as in CCAM paper • Data clusters – in this case partitions are stored in same page
Project Contribution • The important part is the graph partitioning scheme • Previous work: • minimizes the cost of the cut edges • leads to maximizing the Weighted Connectivity Residue Ratio (WCRR) • Indirectly, try to achieve the same goal of pages containing the most likely accessed data. • Derives pages of data where that data is related by being part of the most probable path of a object moving and constrained to a road network – i.e. uses the actual statistics directly
Experiments - Goal • Take 100 paths of 100 steps each from random starting points • Path following a walk allow high AADT edges to find most probable route • Use this to evaluate partitions by measure page miss ratio • Also look at paths formed from random works • Calculate WCRR and CRR from graph and partitions
Measuring Performance • Do not need “actual” performance measures – like milliseconds etc. • Just need ratio of page hits to page misses - p • Introduce this metric into code to take measures directly • Related to WCRR:
Experiments - Data • Took two subsets of data from MnDOT road network, focused on busy areas – i.e. near large city, different in order of magnitude to measure scalability • Used ESRIArcGIS software • MnDOT uses ESRI format • Students get free license from VT • However, not easy to export data in required format – took a long learning curve
Complete MnDOT Data Only Minnesota 193,096 edges, 239,166 nodes
Sub-sets MnDOT Data Took two subsets of data from MnDOT Medium 57,004 edges, 85,565 nodes
Experiments - Data • Extracted to XML • Using ESRI GDBExchange Schema • Wrote Java code using Jakarta XMLBeans to import data and create Java objects • Wrote out files into a more usable and smaller CSV format • ESRI Network Graph exports do not retain edge-node mappings – only spatial references • Wrote more code to match edge start and end locations to node location • Produced node-edge mapping format • Used that to import into a Java based Graph storage format – adjacency list
Conclusions • Using AADT when generating partitions improved performance • But not for random walks • WCRR was an indicator of performance • Experimental and calculated values of WCRR correlated.
Q + A Thank You!!