1 / 29

Beyond Metadata: Towards User-Centric Description of Data Quality

Beyond Metadata: Towards User-Centric Description of Data Quality. Michael F. Goodchild University of California Santa Barbara. Metadata. Data about data handling instructions catalog entry fitness for use What is known about data quality

ulfah
Télécharger la présentation

Beyond Metadata: Towards User-Centric Description of Data Quality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond Metadata: Towards User-Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

  2. Metadata • Data about data • handling instructions • catalog entry • fitness for use • What is known about data quality • a measure of the success of spatial data quality research • much progress has been made • FGDC CSDGM 1994 • ISO 19115 2003 • DDI • EML

  3. Two tests of success • Geobrowsers • Google Earth • geotagging • Wikimapia • Where 2.0

  4. www.wikimapia.org

  5. CSDGM, ISO 19115 • Do they match the state of research? • early 1990s • SDTS discussions of 1980s • the five-fold way • positional accuracy • attribute accuracy • logical consistency • completeness • lineage • Do they represent a user perspective? • committees staffed by data producers • production control mechanisms?

  6. Producer or user? • Producer-centric • details of the production process: the measurement and compilation systems used • tests of data quality conducted under carefully controlled conditions • formal specifications of data set contents • User-centric • effects of uncertainties on specific uses of the data, from simple queries to complex analyses • simple descriptions of quality that are readily understood by non-expert users • tools to enable the user to determine the effects of quality on results

  7. Increasing complexity • Self-documentation • notes to oneself • A colleague • brief description • Another discipline, language, culture • ideal metadata/data ratio?

  8. complexity of metadata social distance

  9. Seven issues • Areas in which research has moved beyond the standards • Accuracy of Spatial Databases 1989 • Measurements from Maps 1989 • 15 books • 1000 journal articles

  10. 1. Decoupling the representative fraction • Ratio of distance on the map to distance on the ground • no flat map of a curved surface can have a constant RF • RF as a surrogate • positional accuracy • spatial resolution • map content • RF undefined for digital data • inherited from source maps • extended by convention • aerial photographs (RF of the photographic plate) • digital orthoimagery (positional accuracy)

  11. 2. Accuracy or uncertainty? • Accuracy • a true value z exists • a measured value z* • error z*-z • RMSE • theory of measurement error • error propagation • Uncertainty • vagueness in definitions • no truth • perhaps a consensus? • lack of replicability • Change of paradigm around 1992

  12. 3. Objects and fields • A fundamental distinction • 1992 • appears nowhere in the standards • Discrete object conceptualization • an empty table top • occupied by discrete, countable objects • points, lines, areas, volumes • Continuous field conceptualization • a mapping from location x to value z • a single-valued function of location

  13. z'(x) = z(x) + δz(x)

  14. Separability • Phenomenon conceptualized as a field • impossible to separate positional and attribute accuracy • interval/ratio (elevation) • nominal (land cover class)

  15. 4. Granularity • Metadata definable at any level • individual vertex • point, line, area • layer • geodatabase • Metadata as a form of generalization • economies of scale • Spatial non-stationarity • Multiple lineages

  16. 5. Collection-level metadata • Describing the properties of entire collections • The Geospatial One-Stop • www.geodata.gov • There will always be more than one one-stop • how to know where to look?

  17. GOS coverage, 1/06

  18. 6. Spatial dependence • Tobler’s First Law • nearby things are more similar than distant things • applies to errors • relative accuracy almost always better than absolute accuracy • covariances as important as variances

  19. Marginal or joint properties? • Visualization of marginal properties • Analytic functions respond to joint properties • slope • area • Joint properties must be described at a higher level • relative errors of vertex positions • described at level of vertex collection

  20. Cross-correlation • How are errors on Layer 1 related to errors on Layer 2? • Error as an issue in interoperability • what happens if I superimpose these layers? • Two layers will almost always not fit • depends on lineage of each • how bad is the misfit? • will it affect my analysis? • Binary metadata • the ability of a pair of data sets to interoperate • not available from either’s unary metadata • If GIS is about overlay • then binary metadata are essential

  21. The way forward • Reopen the metadata debate • an unpopular move • it’s hard enough to persuade people to provide metadata • a standard before its time • standards should emerge only after research is complete • It’s our responsibility • the research task does not end with journal publication • metadata standards express the state of our research • Many other issues not related to data quality • possible allies

More Related