A predictive model for frequently viewed tiles in a Web map

A predictive model for frequently viewed tiles in a Web map Sterling QuinnMGIS CandidateESRI ArcGIS Server Product Engineer Mark GaheganFaculty Advisor

Introduction • This project presents a model for predicting high-traffic areas of a Web map • Model output indicates where server-side cache of map tiles should be created

Project objectives • Describe server-side caching of map tiles • Describe the need for selective caching • Present a predictive model for popular areas of the map • Describe ways the model could be used and evaluated

Web map optimization and the advent of server-side caching

Organizing large maps in manageable “tiles” is not new • Large paper map series are indexed in organized grids • CGIS, a pioneering GIS, used “frames” to organize data (right) From Tomlinson, Calkins, & Marble, 1976, p. 56.

Other techniques for organizing maps in tiles or grid systems • Pyramid technique successively generalizes rasters in groups of four cells (right) • Quadtree structures index datasets in a hierarchy of quadrants From De Cola & Montagne, 1993, p. 1394.

The modern map tile • JPG or PNG image • Standard square dimensions (256 x 256 or 512 x 512) • Stored in large “caches” on the server at multiple scales

Server-side caching of map tiles is new • Traditional map servers (ArcIMS, WMS) draw the image on the fly • Can take a while if the map is complex • Cached map tiles give extremely fast performance • Tiled maps allow users to retrieve just the needed pieces of the map

Advent of tiled maps and server-side caching • Microsoft Terra Server an early deployment of massive amounts of cached imagery tiles • Google Maps serves cached map tiles with AJAX techniques to create a “seamless” Web mapping experience

Tiles in Google Maps quickly retrieved as you navigate 1 2 From Google Maps: http://maps.google.com

Many sites have followed Google’s pattern Yahoo Maps: http://maps.yahoo.com MapQuest: http://www.mapquest.com Microsoft Virtual Earth: http://maps.live.com

Caching options

Current caching options • Current GIS software allows analysts to create tile caches for their own maps • ESRI’s ArcGIS Server • Mapnik • Microsoft MapCruncher

Caching can require enormous resources on the server • Caches covering big areas at large scales can include millions of tiles • Many gigabytes, or even terabytes of storage • Days, weeks, or sometimes months to generate • Many GIS shops lack resources to maintain large caches

Selective caching as a strategy for saving resources • Administrator can cache only the areas anticipated to be most visited • Remaining areas can be: • Added to the cache “on-demand” when first user navigates there • Filled with a “Data not available” tile

Benefits of selective caching • Wise because some tiles (ocean, desert) will rarely, if never, be accessed • Saves time • Saves disk space

Implications of selective caching • Requires an admission that some areas are more important than others • Poses challenge of predicting popular areas before the map is released

The need for a predictive model

Project presents a predictive model for where to pre-cache tiles • “Which places are most interesting?” • Inputs are datasets readily available to GIS analyst • Output vector features a template for where to pre-cache tiles

Purpose of the model • Help majority of users see a fast Web map while minimizing cache creation time and storage space

Not a descriptive model • Descriptive model shows where users have already viewed • Microsoft Hotmap good example of a descriptive tool (right) • Descriptive models useful for deriving and validating predictive models From Microsoft Hotmap http://hotmap.msresearch.us

Advantages of a predictive model • Doesn’t require the map to be deployed already • Can include fixed and varying geographic phenomena • Has applications far beyond map caching

Proposed methods

Study area and conditions • Model predicts frequently viewed places for a general base map • May create models for thematic maps if time allows • Study area of California

Input datasets • Populated / developed areas • Road networks • Coastlines • Points of interest

Populated / developed areas • Human Influence Index grid by the Socioeconomic Data and Applications Center (SEDAC) at Columbia University • Model selects all grid cells over a certain value

Road networks • Major roads buffered by a given distance • All roads within national parks, monuments, historical sites, and recreation areas, buffered by a given distance

Coastlines • All coastlines buffered by a given distance (wider buffer on inland side)

Points of interest • Set of 60 interesting points chosen by model author • Mountain peaks • Theme parks • Sports arenas • Etc. • Represents a flexible layer that could be tailored to local needs

Deriving the output • Merge all layers together • Clip to California outline (with small buffer) • Remove small holes and polygons • Dissolve into one multipart feature • Simplify to remove unneeded vertices

Using the model output • Output a vector dataset that can be used as a template for creating cached tiles • Compare model output area with total area to understand percent coverage • Compare model output with actual usage over time • Refine if necessary

Limitations • Models of world scope should account for Internet connectivity • Input datasets have varying collection dates • Input datasets vary in resolution and precision • Maps with many scales might require multiple iterations and variations of the model

Questions?

References • De Cola, L. & Montagne, N. (1993). The PYRAMID system for multiscale raster analysis. Computers & Geosciences, 19(10), 1393 – 1404. • Tomlinson, R. L., Calkins, H. W., & Marble, D. F. (1976). Computer Handling of Geographical Data. Paris: Unesco.

A predictive model for frequently viewed tiles in a Web map

A predictive model for frequently viewed tiles in a Web map

Presentation Transcript

PARR-30: a predictive model for readmission within 30 days

A predictive Collision Avoidance Model for Pedestrian Simulation

A Predictive Model of Inquiry to Enrollment

Towards A Predictive Level-K Thinking Model

A predictive model for cerebrovascular disease using data mining

Developing a Predictive Model for Internet Video Quality-of-Experience

A Markov Model for Web Request Prediction

WAN area transfers and networking: a predictive model for CMS

A New Model for Web Resource Harvesting

Towards a predictive combustion chemistry model

Co-Design Implementation of a System for Model Predictive Control

Predictive Mean Matching using a Factor Model ,

A New Model for Web Resource Harvesting

Self-Organization in a Parametrically Coupled Logistic Map Network: A Model for Information

PID and Model Predictive Control in a Networked Environment

Building A Predictive Model A Behind the Scenes Look

Model Predictive Uncertainty

A New Model for Web Resource Harvesting

A New Predictive Solar Radiation Numerical Model