1 / 18

Enhancing Biodiversity Data Quality: Error Detection and Cleaning Techniques

This guide focuses on essential error detection and data cleaning methods for biodiversity data, highlighting various error types such as taxonomic and spatial errors. It proposes practical strategies like expert checks, consulting authority lists, and employing automated tools for scientific name extraction. It emphasizes the importance of geographic references for data analysis, addresses common georeferencing mistakes, and outlines data cleaning procedures through simulated error testing. The ultimate goal is to improve data quality while acknowledging the limits of complete data cleaning.

libba
Télécharger la présentation

Enhancing Biodiversity Data Quality: Error Detection and Cleaning Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nothing Is Perfect: Error Detection and Data Cleaning A. Townsend Peterson STOLEN SHAMELESSLY FROM Arthur Chapman …

  2. www.gbif.org/prog/digit/data_quality/URL1124374342

  3. Types of Errors in Biodiversity Data • Taxonomic data

  4. Detection of Taxonomic Errors • Sine qua non – expert checks specimens and associated data • Check names against authority lists • Check names and authorities against authority lists • N.B.: Check out new capabilities for automated detection and extraction of scientific names … http://jbi.nhm.ku.edu

  5. Spatial Error • Geographic references are invaluable in enabling analysis of biodiversity data, but are also extremely prone to problems

  6. Georeferencing Errors

  7. Georeferencing Error

  8. Collector Itineraries

  9. 100 km

  10. Using Ecological Information

  11. Data Cleaning Procedures • Assemble occurrence points for each species • Eliminate occurrence points one at a time (jackknife), and build models without each of the points available • Identify points that are • included in models only when included in the input data set • included in models not even when included in the input data set • Flag these points as suspect for further checking

  12. Data Cleaning Test • Distributional data from the Atlas of Mexican Bird Distributions for various species • Select 18 points at random from those available • Add two random points • Simulates 10% error rate • Use data-cleaning procedure to see if random points could be identified as ‘erroneous’

  13. Example – Crax rubra Successfully identified the 2 random points included in the model

  14. Example – Rauvolfia paraensis Identified one point as outlier. Proved to be an undescribed species

  15. Error Flagging • Never possible to clean completely—what matters is signal to noise ratio • No substitute for inspection and detailed study by specialists • HOWEVER, we can • Detect records with internal inconsistencies that clearly represent error in some field • Detect records with high probability of including errors owing to unusual characteristics • Flag those records for later checking and correction

More Related