1 / 28

Geographic data validation

Geographic data validation. Index. Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations. Index. Basic concepts Why do we need validation? How to assess geographic data Initial checks

thy
Télécharger la présentation

Geographic data validation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geographic data validation

  2. Index • Basic concepts • Why do we need validation? • How to assess geographic data • Initial checks • Intermediate checks • Advanced checks • Some final considerations

  3. Index • Basic concepts • Why do we need validation? • How to assess geographic data • Initial checks • Intermediate checks • Advanced checks • Some final considerations

  4. Basic concepts • Quality • Faithful representation of a feature • Quality of data related to quality of output • GIGO principle • Data have the potential to be used in ways unforeseen when collected. • The value of the data is directly related to the fitness for a variety of uses.

  5. Basic concepts • Fitness-for-use • The suitability of a set of data for a specific purpose • A.K.A. usability • Should not be confused with quality • Quality: Abstract • Usability: Specific • Low-quality dataset may be of a high usability

  6. Basic concepts • Precision • Closeness of repeated measurements to a given value, either correct or not • Accuracy • Closeness of a measurement to the true value

  7. Precision vs Accuracy

  8. Basic concepts • Precision • Closeness of repeated measurements to a given value, either correct or not • Accuracy • Closeness of a measurement to the true value • Precision is an intrinsic value • Accuracy depends on knowing the true value of the variable • Data validation: assessing the accuracy • Compare against a reference value

  9. Index • Basic concepts • Why do we need validation? • How to assess geographic data • Initial checks • Intermediate checks • Advanced checks • Some final considerations

  10. Why do we need validation?

  11. Why do we need validation?

  12. Why do we need validation? • This was a striking example, but more subtle issues can (and actually do) happen • We need to develop techniques and methodologies to explore the data • In other words, we need to validate the data • Validating gives a sense of the reliability of the records, and clues on how to improve it

  13. Index • Basic concepts • Why do we need validation? • How to assess geographic data • Initial checks • Intermediate checks • Advanced checks • Some final considerations

  14. How to assess? • Depending on the aim of the assessment, different techniques • Remember that high quality datasets are more likely to show high fitness-for-use • Ideally, check for quality • If we know the purpose, check for its fitness

  15. How to assess? • Work with geographic information a la DarwinCore • Work with individual records as well as collections of data • Start with the most basic pieces of information • Look for coherence with other pieces of information • If not, why? • Make modifications of information to see if they fit • In more advanced levels, make use of available taxonomic or temporal information

  16. How to assess? • Tools • Spreadsheet: Microsoft Excel, LibreOfficeCalc… • Well-known environment • Visually easy • Open Refine • Spreadsheet-like, but with some enhanced features • Scripts • Database scripts: work directly at the source • Other programming language: enhanced capabilities • GIS software • Often linked with other tools, such as spreadsheets or scripts

  17. Visualizations • Visual exploration of record set • Useful for a first-level assessment • Primary visualization for geographic data: maps • Next picture has several issues that can be detected using a map…

  18. Coordinate transposition • This happens when latitude is stored in longitude field and vice-versa • Usually difficult to detect on a one-by-one basis • But when looked at the whole picture…

  19. Zero vs Null • One of the most common issues • Storing 0 (zero) instead of leaving the field empty • This happens with some data management systems • Latitude 0 and longitude 0 are stored meaning “unknown coordinates” • But we do not know that, that is not what the standard says

  20. Negation • Forgetting or altering the positive/negative of the coordinates • Usually forgetting the minus sign • The most common source: transforming from DMS to DD, without taking “W” or “S” into account

  21. Check against country • The easiest way of checking these issues is to check if the coordinates fall inside the specified country… • Of course, if we have a country value to check against • Two ways • Use GIS software • Use webservices like geonames (we will see this in the openRefine session)

  22. Georeferencing • Intermediate check • If we have locality information and coordinates, we can check if they match • Georeferencing is a tough task, and prone to uncertainties, so some level of imprecision is to be expected • Make good use of the “uncertainty” fields in DarwinCore! • But still…

  23. 55.932576, 13.132359 Anahuac NWR (UTC 049) Grandville POINT(-1.3223333 53.44958) Marine Nature Study Area 78º 47’ 52” S; 35º 50’ 31” E Stewart Park POINT(-1.1735004 53.358746) Backyard My Habitat 55.932576, 13.132359 Wilderness Park, north of 14th St. 28054 Delaney Conservation Area 57.3, 11.9

  24. Multi-domain checks • Using information from different sources to check quality • Especially use taxonomic information to improve geospatial data • Most basic example: check data against range map • If point falls inside range map of the specified species, OK • Sometimes, temporal information is useful

  25. Index • Basic concepts • Why do we need validation? • How to assess geographic data • Initial checks • Intermediate checks • Advanced checks • Some final considerations

  26. Considerations • NEVER modify the original data • Data cleaning is a human task, and thus, it is not error-free • Information we believe is wrong may be right • Make an “improved copy” of the data • Or “flag” the records as inaccurate • Re-share the improvements • With the community: so that others don’t have to re-invent the wheel • With the original owners of the data: so that they can correct the errors at the source

More Related