200 likes | 324 Vues
This document elaborates on the enhancements, structure, and recommendations for Darwin Core (DwC) Archives, as ratified by TDWG in October 2009. It discusses the vocabulary of terms, multiple data representations in XML and RDF, and includes a Text Guide that addresses various aspects of data validation, handling multiple values, and extending core structures. Recommendations for improving field attributes regarding data types and delimiters are provided to ensure better machine-readability and compatibility. Open questions for future consideration are also highlighted.
E N D
EOL and DwC-Archives Patrick Leary pleary@eol.org
Brief Background • Darwin Core ratified by TDWG October 2009 • Consists of a vocabulary of terms • Multiple representations in XML, RDF • Documentation includes Text Guide • Text archives called Darwin Core Archives
DwC-Archive Structure source: http://www.gbif.org/resources/2554
Meta File source: http://yuml.me/
Validating • fileType has dateFormat attribute • DD-MM-YYYY, MM-DD-YYYY • fieldcannot specify data type to expect • field has vocabulary attribute • URI for a vocabulary; should be machine readable • Uncertain the format of the vocabulary • Recommendations: • dataType attribute to field (string, float, integer, date, boolean, uri) • values, optionalValues attribute; delimited choices
Handling Multiple Values • Some DwC terms recommend multiple values • 10% of all terms suggest “A list (concatenated and separated)” • DwC nor Archive meta file specify delimiter • Recommendations: • multiValueDelimiterattribute to field • allowsMultiValue attribute to field
Original Meta File source: http://yuml.me/
If Recommendations Were Applied source: http://yuml.me/
DwC-Archive Structure source: http://www.gbif.org/resources/2554
EOL Partial Data Model source: http://yuml.me/
Adding Structured Data source: http://yuml.me/
Extending Extensions • core can have extensions • extensions do not have to be linked to core • index attribute of coreid is optional • extensions have no explicit id • extensions cannot be linked to each other
Possible Workarounds • Flatten and repeat data • works for non-structured extension data • don’t want to end up with JSON values • Create multiple archives • Create multiple meta files • Modify the structure of the meta file • Create alternate meta file • Modify the meta file XSD
Changing Meta File • Minimal change • Add idelement to extension / fileType • Add extensionid element to fileType • With attributes rowType • Possibly some indication of hasMany • Larger change • Unify core and extension • Change coreid accordingly
Original Meta File source: http://yuml.me/
Diagram of minimal change source: http://yuml.me/
Diagram of larger change source: http://yuml.me/
Summary of Recommendations • dataType attribute to field (string, float, integer, date, boolean, uri) • values, optionalValuesattributes to field • multiValueDelimiter attribute to field • allowsMultiValue attribute to field
Open Questions • Are these recommendations worth pursuing? • How to proceed with extending extensions? • How to update Darwin Core Text Guide with respect to Darwin Core terms? • Should Darwin Core Text Guide be separated? • Should meta file schema be separated from Text Guide?
Thank You Patrick Leary pleary@eol.org