1 / 8

Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD?

Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD?. CC BY-SA 4.0. Summary of the Problem. Data sharing requires that data can be parsed, understood, and reused by people and applications other than those that created the data How do we do this now?

tcalvin
Télécharger la présentation

Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Type Registries #212 Month StatusLarry Lannom, Tobias WeigelDate Location TBD? www.rd-alliance.org - @resdatall CC BY-SA 4.0

  2. Summary of the Problem • Data sharing requires that data can be parsed, understood, and reused by people and applications other than those that created the data • How do we do this now? • For documents – formats are enough, e.g., PDF, and then the document explains itself to humans • This doesn’t work well with data – numbers are not self-explanatory • What does the number 7 mean in cell B27? • Data producers may not have explicitly specified certain details in the data: measurement units, coordinate systems, variable names, etc. • Need a way to precisely characterize those assumptions such that they can be identified by humans and machines that were not closely involved in its creation

  3. Goal of the Effort • Evaluate and identify a few assumptions in data that can be codified and shared in order to… • Produce a functioning Registry system that can easily be evaluated by organizations before adoption • Highly configurable for changing scope of captured and shared assumptions depending on the domain or organization • Supports several Type record dissemination variations • Design for allowing federation between multiple Registry instances • The emphasis is not on • Identifying every possible assumption and data characteristic applicable for all domains • Technology

  4. Status of the Deliverable • A prototype is at: http://typeregistry.org/ • Multiple other implementations/projects, including multiple schemas • Implementation supports notions of primitives and derived types • Primitives are fundamental types that we expect humans and software to parse and understand • Derived types depend on primitives to describe something complex • Registered types are assigned unique identifiers • Initial WG output published as ICT Technical Standard • ISO Study Group in process

  5. Initial Adopters • EarthCube– Steve Richard • Vermont Monitoring Cooperative – Mike Finnegan • DKRZ – Tobias Weigel • ePIC– Ulrich Schwardmann • NIST, Common Access Platform – Wo Chang • CNRI – multiple projects • Ongoing ISO Study Group

  6. Expected Impact of the Deliverable • Best case scenario: agreed upon set of standard schemas; ISO standard • Wide use of types for data sharing and workflow automation • Significant use of federation of distributed set of type registries • Extended use of typed attribute/value pairs in PID resolution • Worst case scenario: no agreed upon set of schemas, no further standardization • General concept influences multiple communities in the direction of clearer data syntax and semantics • ICT Tech Standard remains • Existing use of typed attribute/value pairs in PID resolution

  7. Expected Impact of the Deliverable Before After • Data sets difficult to impossible to parse, understand, and re-use unless you created them, know who did, or there exists detailed pubic documentation. • Search criteria for data sets restricted to keywords and sources. • Standardization across data sets fairly arbitrary, concentrated in small groups and narrow communities. • Data sets can be typed at a fine level of granularity, those types can be registered in a public registry, and those type records can contain sufficient information to make detailed and accurate use of the data sets so typed. • Search criteria for data sets can include type information, yielding easier comparisons and mash-ups. • Greater chance of standards developing across data set construction.

  8. Feedback Desired from RDA Community • Participation and feedback has been excellent, thank you. • More use cases are always desirable • All else being equal the value of an RDA output will be proportional to the number of use cases against which it was formulated • Agreements on federation and testbeds needed going forward • Data Fabric • Test bed listing coming out of TAB

More Related