1 / 44

BioGeomancer: Semi-automated Georeferencing Engine

BioGeomancer: Semi-automated Georeferencing Engine John Wieczorek, Aaron Steele, Dave Neufeld, P. Bryan Heidorn, Robert Guralnick, Reed Beaman, Chris Frazier, Paul Flemons, Nelson Rios, Greg Hill, Youjun Guo. Spatially Challenged Occurrence Data. LA PEÑITA; 5.5. KM N

oya
Télécharger la présentation

BioGeomancer: Semi-automated Georeferencing Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioGeomancer: • Semi-automated • Georeferencing Engine • John Wieczorek, Aaron Steele, Dave Neufeld, P. Bryan Heidorn, • Robert Guralnick, Reed Beaman, Chris Frazier, • Paul Flemons, Nelson Rios, Greg Hill, Youjun Guo

  2. Spatially ChallengedOccurrence Data • LA PEÑITA; 5.5. KM N • Baird Mtns.; Salmon R. headwaters • CALIENTE MOUNTAIN • 10 MI SW CANAS, RIO HIGUERON • near Sedan • 4.4 MI N, 6.2 MI W SEMINOLE

  3. Spatially EnabledOccurrence Data

  4. Georeferencing Engine

  5. GeoLocate

  6. Input - Verbatim Locality Strings • LA PEÑITA; 5.5. KM N • Baird Mtns.; Salmon R. headwaters • CALIENTE MOUNTAIN • 10 MI SW CANAS, RIO HIGUERON • near Sedan • 4.4 MI N, 6.2 MI W SEMINOLE

  7. Legacy Locality Data Issues • Treat locality description as accurate • Treat locality description as complete

  8. Legacy Locality Data Issues • Treat locality description as accurate • Treat locality description as complete We need these to start processing.

  9. Legacy Locality Data Issues • Treat locality description as accurate • Treat locality description as complete We need these to start processing. These are assumptions we should not hold to be true.

  10. Legacy Locality Data Issues • Treat locality description is accurate • Treat locality description as complete • Apply rules for locality string interpretation

  11. Georeferencing Engine -Locality Interpretation Components

  12. Legacy Locality Data Issues • Treat locality description is accurate • Treat locality description as complete • Apply rules for locality string interpretation There is more than one way to accomplish string interpretation.

  13. Locality Interpretation Methods • Regular expression analysis • GeoLocate - Tulane • Enhanced BioGeomancer Classic – Yale • Machine Learning/Natural Language Processing • U. Illinois, Urbana-Champagne • Inxight Software, Inc.

  14. Locality Types • F – feature • P – path • FO – offset from a feature, sans heading • FOH – offset from feature at a heading • FO+ – orthogonal offsets from a feature • FPOH – offset at a heading from a feature along a path • 31 other locality types known so far

  15. Five Most Common Locality Types* • 51.0% - feature • 21.4% - locality not recorded • 17.6% - offset from feature at a heading • 8.6% - path • 5.8% - undefined *based on 500 records randomly selected from the 296k records georeferenced manually in the MaNIS Project.

  16. Clause Subset of a locality description to which a locality type can be applied.

  17. Step 1: Define Clause Boundaries • LA PEÑITA; 5.5. KM N • Baird Mtns.; Salmon R. headwaters • CALIENTE MOUNTAIN • 10 MI SW CANAS, RIO HIGUERON • near Sedan • 4.4 MI N, 6.2 MI W SEMINOLE

  18. Step 1: Define Clause Boundaries • <LA PEÑITA; 5.5. KM N>

  19. Step 1: Define Clause Boundaries • <LA PEÑITA; 5.5. KM N> • <Baird Mtns.; >

  20. Step 1: Define Clause Boundaries • <LA PEÑITA; 5.5. KM N> • <Baird Mtns.; ><Salmon R. headwaters>

  21. Step 1: Define Clause Boundaries • <LA PEÑITA; 5.5. KM N> • <Baird Mtns.; ><Salmon R. headwaters> • <CALIENTE MOUNTAIN> • <10 MI SW CANAS, ><RIO HIGUERON> • <near Sedan> • <4.4 MI N, 6.2 MI W SEMINOLE>

  22. Step 2: Determine Locality Types • <FOH>LA PEÑITA; 5.5. KM N</FOH>

  23. Step 2: Determine Locality Types • <FOH>LA PEÑITA; 5.5. KM N</FOH> • <F>Baird Mtns.; </F>

  24. Step 2: Determine Locality Types • <FOH>LA PEÑITA; 5.5. KM N</FOH> • <F>Baird Mtns.; </F><PS>Salmon R. headwaters</PS>

  25. Step 2: Determine Locality Types • <FOH>LA PEÑITA; 5.5. KM N</FOH> • <F>Baird Mtns.; </F><PS>Salmon R. headwaters</PS> • <F>CALIENTE MOUNTAIN</F> • <FOH>10 MI SW CANAS, </FOH><P>RIO HIGUERON</P> • <NF>near Sedan</NF> • <FO+>4.4 MI N, 6.2 MI W SEMINOLE</FO+>

  26. Step 3: Interpret Clauses • <FOH>LA PEÑITA; 5.5. KM N</FOH> Feature: LA PEÑITA Offset: 5.5 Offset Units: KM Heading: N

  27. Step 4: Find Feature Descriptions • <FOH>LA PEÑITA; 5.5. KM N</FOH> Feature: LA PEÑITA Offset: 5.5 Offset Units: KM Heading: N

  28. Georeferencing Engine -Spatial Description Components

  29. Legacy Locality Data Issues • Treat locality description is accurate • Treat locality description as complete • Apply rules for locality string interpretation • Treat spatial data references as accurate

  30. Legacy Locality Data Issues • Treat locality description is accurate • Treat locality description as complete • Apply rules for locality string interpretation • Treat spatial data references as accurate This is another assumption we should not hold to be true.

  31. “Davis, Yolo County, California”

  32. “Davis, Yolo County, California”

  33. “Davis, Yolo County, California”

  34. Legacy Locality Data Issues • Treat locality description is accurate • Treat locality description as complete • Apply rules for locality string interpretation • Treat spatial data references as accurate • Apply rules for spatial description building

  35. Step 5: Construct Spatial Description for Each Clause

  36. Step 5: Construct Spatial Description for Each Clause West of B

  37. Step 6: Construct Final Spatial Interpretation • 10 MI SW CANAS, RIO HIGUERON Clause 1: <FOH>10 MI SW CANAS, </FOH> Clause 2: <P>RIO HIGUERON</P>

  38. Step 6: Construct Final Spatial Interpretation • 10 MI SW CANAS, RIO HIGUERON Clause 1: <FOH>10 MI SW CANAS, </FOH> Clause 2: <P>RIO HIGUERON</P> We hold these clauses to be simultaneously true.

  39. Step 6: Construct Final Spatial Interpretation • 10 MI SW CANAS, RIO HIGUERON Clause 1: <FOH>10 MI SW CANAS, </FOH> Clause 2: <P>RIO HIGUERON</P> We hold these clauses to be simultaneously true. The final spatial description is the intersection of the spatial descriptions of all clauses.

  40. Legacy Locality Data Issues • Treat locality description is accurate • Treat locality description as complete • Apply rules for locality string interpretation • Treat spatial data references as accurate • Apply rules for spatial description building • Apply criteria to reject unwanted hypotheses

  41. Additional Input - Preferences • Assume terrestrial locations • Assume aquatic locations • marine only • freshwater only • Assume direct offsets • Assume offsets by road, if possible

  42. Output • Original data • Zero, one, or more spatial interpretations - spatial footprint - point-radius description • Process metadata • preferences (e.g., GeoLocate method, assume by road) • omissions (e.g., unused information) • confidence values

  43. Conclusion Georeferences are hypotheses Hypotheses require testing Tested hypotheses should be so noted

More Related