340 likes | 437 Vues
This project introduces a model to predict high-traffic areas of a web map for efficient server-side caching. Learn about selective caching benefits and methods, optimizing map tiles, and the modern approach to map organization. Explore the implications and benefits of selective caching and predictive models for enhancing web mapping experiences.
E N D
A predictive model for frequently viewed tiles in a Web map Sterling QuinnMGIS CandidateESRI ArcGIS Server Product Engineer Mark GaheganFaculty Advisor
Introduction • This project presents a model for predicting high-traffic areas of a Web map • Model output indicates where server-side cache of map tiles should be created
Project objectives • Describe server-side caching of map tiles • Describe the need for selective caching • Present a predictive model for popular areas of the map • Describe ways the model could be used and evaluated
Organizing large maps in manageable “tiles” is not new • Large paper map series are indexed in organized grids • CGIS, a pioneering GIS, used “frames” to organize data (right) From Tomlinson, Calkins, & Marble, 1976, p. 56.
Other techniques for organizing maps in tiles or grid systems • Pyramid technique successively generalizes rasters in groups of four cells (right) • Quadtree structures index datasets in a hierarchy of quadrants From De Cola & Montagne, 1993, p. 1394.
The modern map tile • JPG or PNG image • Standard square dimensions (256 x 256 or 512 x 512) • Stored in large “caches” on the server at multiple scales
Server-side caching of map tiles is new • Traditional map servers (ArcIMS, WMS) draw the image on the fly • Can take a while if the map is complex • Cached map tiles give extremely fast performance • Tiled maps allow users to retrieve just the needed pieces of the map
Advent of tiled maps and server-side caching • Microsoft Terra Server an early deployment of massive amounts of cached imagery tiles • Google Maps serves cached map tiles with AJAX techniques to create a “seamless” Web mapping experience
Tiles in Google Maps quickly retrieved as you navigate 1 2 From Google Maps: http://maps.google.com
Many sites have followed Google’s pattern Yahoo Maps: http://maps.yahoo.com MapQuest: http://www.mapquest.com Microsoft Virtual Earth: http://maps.live.com
Current caching options • Current GIS software allows analysts to create tile caches for their own maps • ESRI’s ArcGIS Server • Mapnik • Microsoft MapCruncher
Caching can require enormous resources on the server • Caches covering big areas at large scales can include millions of tiles • Many gigabytes, or even terabytes of storage • Days, weeks, or sometimes months to generate • Many GIS shops lack resources to maintain large caches
Selective caching as a strategy for saving resources • Administrator can cache only the areas anticipated to be most visited • Remaining areas can be: • Added to the cache “on-demand” when first user navigates there • Filled with a “Data not available” tile
Benefits of selective caching • Wise because some tiles (ocean, desert) will rarely, if never, be accessed • Saves time • Saves disk space
Implications of selective caching • Requires an admission that some areas are more important than others • Poses challenge of predicting popular areas before the map is released
Project presents a predictive model for where to pre-cache tiles • “Which places are most interesting?” • Inputs are datasets readily available to GIS analyst • Output vector features a template for where to pre-cache tiles
Purpose of the model • Help majority of users see a fast Web map while minimizing cache creation time and storage space
Not a descriptive model • Descriptive model shows where users have already viewed • Microsoft Hotmap good example of a descriptive tool (right) • Descriptive models useful for deriving and validating predictive models From Microsoft Hotmap http://hotmap.msresearch.us
Advantages of a predictive model • Doesn’t require the map to be deployed already • Can include fixed and varying geographic phenomena • Has applications far beyond map caching
Study area and conditions • Model predicts frequently viewed places for a general base map • May create models for thematic maps if time allows • Study area of California
Input datasets • Populated / developed areas • Road networks • Coastlines • Points of interest
Populated / developed areas • Human Influence Index grid by the Socioeconomic Data and Applications Center (SEDAC) at Columbia University • Model selects all grid cells over a certain value
Road networks • Major roads buffered by a given distance • All roads within national parks, monuments, historical sites, and recreation areas, buffered by a given distance
Coastlines • All coastlines buffered by a given distance (wider buffer on inland side)
Points of interest • Set of 60 interesting points chosen by model author • Mountain peaks • Theme parks • Sports arenas • Etc. • Represents a flexible layer that could be tailored to local needs
Deriving the output • Merge all layers together • Clip to California outline (with small buffer) • Remove small holes and polygons • Dissolve into one multipart feature • Simplify to remove unneeded vertices
Using the model output • Output a vector dataset that can be used as a template for creating cached tiles • Compare model output area with total area to understand percent coverage • Compare model output with actual usage over time • Refine if necessary
Limitations • Models of world scope should account for Internet connectivity • Input datasets have varying collection dates • Input datasets vary in resolution and precision • Maps with many scales might require multiple iterations and variations of the model
References • De Cola, L. & Montagne, N. (1993). The PYRAMID system for multiscale raster analysis. Computers & Geosciences, 19(10), 1393 – 1404. • Tomlinson, R. L., Calkins, H. W., & Marble, D. F. (1976). Computer Handling of Geographical Data. Paris: Unesco.