1 / 94

Library of Congress Workshop

Preservation Issues Related to Digital Geospatial Data Steven P. Morris Head of Digital Library Initiatives North Carolina State University Libraries. Library of Congress Workshop. April 21, 2008. Revisiting Key Geospatial Data Types Risks to Digital Geospatial Data

margit
Télécharger la présentation

Library of Congress Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preservation Issues Related to Digital Geospatial DataSteven P. MorrisHead of Digital Library InitiativesNorth Carolina State University Libraries Library of Congress Workshop April 21, 2008

  2. Revisiting Key Geospatial Data Types Risks to Digital Geospatial Data Value in Temporal/Historical Data Archiving Challenges Overview of the Problem Area: Outline Note: Percentages based on the actual number of respondents to each question

  3. Brief (Very) Overview of the Geospatial Domain Note: Percentages based on the actual number of respondents to each question

  4. Data Types – Digital Orthophotography • All 100 NC counties with orthos • 1-5 flight years per county • 30-300 gb per flight

  5. Geospatial Data Types – Vector GIS • County, municipal, state • Detailed, accurate, current • Frequently updated • Cadastral (tax parcels) • Street centerlines • Zoning • Topographic contours • School, sheriff, fire • Voting precincts • More …

  6. Data Types – Spatial Databases • Vector and raster data • Relationships • Behaviors • Annotation • Data Models

  7. Geospatial Data Types – Cartographic • GIS Software • Software project file (.mxd, .apr, …) • Data layer file (.avl, .lyr, …) • PDF map exports • Web Services-based representations Note: Percentages based on the actual number of respondents to each question

  8. Other Geospatial Data Types – Place-based Data Oblique Imagery • Mobile, LBS, and, social networking applications • Long-term cultural heritage value in non-overhead imagery: more descriptive of place and function Street View Images Tax Dept. Photos Road Videologs Note: Percentages based on the actual number of respondents to each question

  9. Geospatial Data: Compelling Issues • Dynamic content • Constantly updated information • Data versioning • Digital object complexity • Spatially enabled databases • Complicated, multi-component formats • Proprietary formats Note: Percentages based on the actual number of respondents to each question

  10. Risks to Geospatial Data Note: Percentages based on the actual number of respondents to each question

  11. Bob’s hard drive Last week’s set of nightly tape backups Several boxes of CD’s and DVD’s The data back-end for our internet mapping application A collection of files in our “GIS Folder” A stand-alone spatial database An enterprise GIS How would you describe your current geospatial archive?

  12. Digital Preservation Points of Failure • Data is not saved, or … • can’t be found, or … • media is obsolete, or … • media is corrupt, or … • format is obsolete, or … • file is corrupt, or … • meaning is lost Solutions: Migration Emulation Encapsulation XML

  13. Risks to Geospatial Data • Producer focus on current data • Data overwrite as common practice • Future support of data formats in question • No open, supported format for vector data • Shift to web services-based access • Data becoming more ephemeral • Inadequate or nonexistent metadata • Impedes discovery and use • Increasing use of spatial databases for data management • The whole is greater than the sum of the parts

  14. Value in Older Geospatial Data Note: Percentages based on the actual number of respondents to each question

  15. Value in Older Data: Cultural Heritage Future uses of data are difficult to anticipate (as with Sanborn Maps)

  16. Value in Older Data: Solving Business Problems Land use change analysis Site location analysis Real estate trends analysis Disaster response Resolution of legal challenges Impervious surface maps Suburban Development 1993/2002 Near Mecklenburg-Cabarrus County border

  17. Problem: Flood and Hurricane Preparedness

  18. Application: Impervious Surface Change Mapping A. B. 2004 Aerial Photography 2002 Impervious D. C. 2004 Impervious Update 2004 Impervious using 2002 Mask

  19. Problem: Beach Erosion and Shoreline Change

  20. Application: Shoreline Change Mapping

  21. Problem: Tracking Land Use Change

  22. Developing Areas Application: Land Use Change Mapping Output GIS Data Input Data Using Mecklenburg County 2002 true color orthorectified aerial photography

  23. Preservation Challenges Note: Percentages based on the actual number of respondents to each question

  24. Challenge: Vector Data Formats • No widely-supported, open vector formats for geospatial data • Spatial Data Transfer Standard (SDTS) not widely supported • Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access” • Spatial Databases • The whole is more than the sum of the parts, and the whole is very difficult to preserve • Can export individual data layers for curation, but relationships and context are lost • Some thinking of using the spatial database as the primary archival platform

  25. Challenge: Preserving Geodatabases • Spatial databases in general vs. ESRI Geodatabase “format” • Not just data layers and attributes—also topology, annotation, relationships, behaviors • ESRI Geodatabase archival issues • XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication • Some looking to Geodatabase as archival platform (in addition to feature class export) Note: Percentages based on the actual number of respondents to each question

  26. Challenge: Cartographic Representation Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.

  27. Challenge: Geospatial Web Services • How to capture records from decision- • making processes?

  28. Challenge: Preservation Metadata Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities

  29. Challenge: Data Capture 2006 Frequency of Capture Survey targeting North Carolina counties and municipalities Response: yes = 65.3%, no = 34.7%* (out of 57.6% response rate)

  30. Challenge: Digital Object Complexity Note: Percentages based on the actual number of respondents to each question

  31. Building Data Bundles: The Zip Codes Example Note: Percentages based on the actual number of respondents to each question

  32. Where is the Dataset? Note: Percentages based on the actual number of respondents to each question

  33. Here’s One! • Files • Multi-file dataset • Georeferencing • Metadata file • Symbolization file • Additional • documentation • License • Disclaimer • More • Metadata • FGDC • Acquisition metadata • Transfer metadata • Ingest metadata • Archive rights • Archive processes • Collection metadata • Series metadata Note: Percentages based on the actual number of respondents to each question

  34. Other Challenges • Rights management • Data versioning • Semantic issues • Large scale content transfer • Integrating older analog data • More … Note: Percentages based on the actual number of respondents to each question

  35. Approaches to Archiving and Preservation Current and Recent Geoarchiving Projects Content Identification Content Selection Content Exchange Digital Repository Development Engaging Spatial Data Infrastructure Archives Processes Looking for Solutions: Outline Note: Percentages based on the actual number of respondents to each question

  36. Different Ways to Approach Preservation • Technical solutions: How do we preserve acquired content over the long term? • Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production? Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata Note: Percentages based on the actual number of respondents to each question

  37. Different Ways to Approach Preservation • Technical solutions: How do we archive acquired content over the long term? • Build data repositories: not just as an end in itself but also as a catalyst for discussion within the data community • Develop repository ingest workflows: create technical points of engagement with other NDIIPP preservation projects and build on collective learning experience Note: Percentages based on the actual number of respondents to each question

  38. Different Ways to Approach Preservation • Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be archived—from point of production? • Engage data producer community and spatial data infrastructure through outreach and engagement; influence practice • Sell the problem to software vendors and standards development • Find overlap with more compelling business problems: disaster preparedness, business continuity, road building, etc. • Start a discussion about roles at the local, state, and federal level Note: Percentages based on the actual number of respondents to each question

  39. Current or Recent Geospatial Data Archiving Projects Note: Percentages based on the actual number of respondents to each question

  40. Selected Geospatial Data Archive Projects Note: Percentages based on the actual number of respondents to each question

  41. NC Geospatial Data Archiving Project • Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP) • One of 8 initial NDIIPP collection building partnerships • Focus on state and local geospatial content in North Carolina (statedemonstration) • Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventories • Objective: engage existing state/federal geospatial data infrastructures in preservation Serve as catalyst for discussion within industry Note: Percentages based on the actual number of respondents to each question

  42. NCGDAP Goals • Repository Goal • Capture at-risk data • Explore technical and organizational challenges • Project End Goal • Data Producers: Improved temporal data management practices • Archives: More efficient means of acquiring and preserving data; Progress towards best practices Temporal data management vs. long-term preservation Note: Percentages based on the actual number of respondents to each question

  43. Content Identification Note: Percentages based on the actual number of respondents to each question

  44. Formal Inventory Processes • Alleviate “contact fatigue” on part of local agencies • 20 different NC state agencies contact local agencies for data … also, federal/regional agencies • Geospatial data is complex, requiring lengthy inventory process • Must capture descriptive, technical, and administrative information related to the data • Make the inventory available as a sharable data store Note: Percentages based on the actual number of respondents to each question

  45. What do Inventories Offer to Archives? • Data Availability Information • Detailed information by data layer • Contact Information • Minimal Metadata • Descriptive, technical, administrative • Rights Information • Document Technical Environment • Software used, formats, transfer methods • Future Data Development Plans Note: Percentages based on the actual number of respondents to each question

  46. Detailed Information About Data Note: Percentages based on the actual number of respondents to each question Source: NC OneMap Data Inventory 2004

  47. Inventories as Source of MetadataExample: Surface Water Note: Percentages based on the actual number of respondents to each question

  48. Content Selection Note: Percentages based on the actual number of respondents to each question

  49. Selection Issues • Most content is already at some level of risk • Early-Middle-Late Stage issues • Middle stage is usually the “sweet spot”, e.g. TIFF orthophotos vs. raw images or compressed images • Also added-value products: digital maps, cartographic representation • Digital maps: “record” or not? • Frequency of capture Note: Percentages based on the actual number of respondents to each question

More Related