Ensuring Quality in Data Exchanges: A Tri-Part Approach in the French Information System on Nature and Landscapes
130 likes | 155 Vues
Learn about the importance of data validation in biodiversity and environmental conservation decisions and how the French Information System on Nature and Landscapes ensures data quality.
Ensuring Quality in Data Exchanges: A Tri-Part Approach in the French Information System on Nature and Landscapes
E N D
Presentation Transcript
Data Quality in Data Exchanges: a Tri-Part Approach in the French Information System on Nature and Landscapes Rémy Jomier (UMS Patrinat, National Natural History Museum – MNHN –, French Agency for Biodiversity – AFB –, and National Center for Scientific Research – CNRS –), Nature data standardization manager Solène Robert (UMS Patrinat, National Natural History Museum – MNHN –, French Agency for Biodiversity – AFB –, and National Center for Scientific Research – CNRS –), Nature data and geographical data cellcoordinator
Why validate taxon data ? • « Whoever is careless with the truth in small matters can not be trusted with important matters » - A. EinsteinA datum is small… But has to be validated (hence, true) in order to be trusted with the important matters : biodiversity, our environment, and related research/political decisions. It gets even more important when you know that:« some outside the museum community see the quality of museum data as being generally unacceptable for use in making environmental conservation decisions » - A. Chapman • Some of you may think « Hey, • I know THAT name ! »
What’s the « SINP » ? • SINP: French, national Information System on Nature and landscaPes. (Système d'Information Nature et Paysages) • Encompassesanybiodiversity and geological data in France. Includesboth taxon occurrences and habitat occurrences for biodiversity. • Uses welldefinedrules, often more constrainingthan the Darwin Core (DwC) • The part dealingwith a taxon occurrence : OccTax
What is a taxon occurrence? Observation or non-observation of a taxon, at a time, a place, by observers. • Example : • On 10th may 2014, Patrick Haffner (MNHN) observed badger traces at the point 8 050, 67 523 (Lambert 93 projection) • Indirect observation of a mammal • Direct observation of a butterfly
Prerequisite to data exchange: • Data conformity and consistency • Conformity: ensuresthat a datumcanbeexchanged • Presence of compulsoryelements • Type of the attribute (text, number, date…) • Consistency: ensuresthatthereis no blatanterror • Checkingconsistencyneeds to have comparisonbetweenelements • Example: end date / start date • Both are ensured by a national protocol, common to all
Scientific validation of data: What happens • Validation processes • Validation levels Datum • Manual: experts • Automated: compare with knowledge bases • Combined arms: both for maximum damage ! Er… Sorry, validation. • Producer validation • Regional validation • National validation • SCOPE ! • Looks like overkill ? Nu-uh ! That’s the bare minimum !
Scientific validation of data: the needs within SINP • 3 levels : • Producer’slevel, with a self evaluation • Regionallevel, coordinated at a regionallevel • National level, coordinated at a national level (canbeequivalent, at times, to the regionallevel, taking care not to duplicate efforts is key) • National validation isdoneglobally, by using national expert networks and feedback fromusers, or knowledgedatabases • Validation should NEVER slow down data movement… But shouldalsobeexchangedwhenitexists.
Scope ? • Quick and relatively easy to check: the taxon/date/location triplet Minimal scope • Not always easy to check: any other information. The process to which it’s been submitted, and what elements have been checked, have to be described. Enlarged scope
Scientific validation of data: Processes • Automated process : Comparing information with reference databases (presence maps for example). Very quick, but dependent upon existing databases. • 1.5 hour / 1,5 million data for conformity, consistency, and minimal scope scientific validation • Manual process : has to have experts intervene and check each and every datum. Time consuming, but very reliable, can work outside of automated bounds and without databases. • Combined process: combines both, with automated process flagging things that the experts should check.
Scientific validation of data: Results • Eachdatumistaggedwith a trust level, and all relevant information: • Level (producer, regional, national) • Scope (minimal, enlarged) • Validator (thisensures trust) • Date (in case of a furtherrevision and update) • Type of validation (manual, automatic, combined) • Reasonable Fallout: validated data
How does all that affect data exchange ? • Each element needs to be attached to the datum • Aim of a standard : exchanging information Need for concise information, numerical or 1-2 letter codes • No need to embark useless information. « not checked » doesn’t need data • Levels don’t require same information (producer vs regional/national) • Checking for duplicate data: interesting • Data may have been validated on compulsory attributes, but rejected on optional ones: Keeping the optional information in a different place, validated information is exchanged • Data flow: when do we update and how ? Modification date on the datum
TēnākoutouThankyou for your attention • E-mail : rjomier[at]mnhn.fr