1 / 7

Discussion of Data Fabric Terms & Preparation for RDA P7

Join the DFT-IG and DF-IG for a discussion on data fabric terms, vocabulary services, and plans for the upcoming RDA P7 meeting. Share candidate terms and definitions for further development.

nbell
Télécharger la présentation

Discussion of Data Fabric Terms & Preparation for RDA P7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discussion of Data Fabric Terms & Preparation for RDA P7 Virtual Meeting Monday, January 25, 2016 Organized by Gary Berg-Cross (DFT-IG) and Peter Wittenburg (DF-IG)

  2. Agenda, Context and Recap • Brief update by Gary Berg-Cross on DFT IG activities • New terms, • Use cases for vocabulary services, • Context for DF term discussion and • P7 plans • Overview of vocabulary issues from Data Fabric standpoint by Peter Wittenburg who will provide some overview of terms and issues as part of the meeting • Discussion of how to handle vocabulary issues going forward. • This meeting will be an opportunity to discuss some of the troublesome DF terms (and maybe other ones) in context to see if we can develop some working draft definitions that can be firmed up over time. • Vocabulary issues and plans from other RDA groups • If interested people can respond here with candidate terms or issues and perhaps working definitions as well as bring them up at the meeting as noted in the agenda.

  3. DFTIG Status and Plans • Some terms about repository registries, for example, have been entered into the RDA DFT term tool based on recent DF discussions and posts as well as RDA-WDS Data-Pub Workflows. • http://smw-rda.esc.rzg.mpg.de/index.php/Special:AllPages • Collection Registry‎ • Repository Registry‎;  • Data repository entry‎; • Data review‎ .  • , Data journal‎; • In addition we are working with the Vocabulary Services IG to use some of their tool-based services to improve our vocabularies: • Providing URLs for each term for referencing • Creating taxonomies from the definitions • Handling synonyms etc.

  4. Broadening the Discussion (Stepwise or Scope-wise) Data Management (and use) is broad so we are building out from our start Digital Data Management including unregistrered data (is a broader concept) Digital Object Management (registered, digital data) Where are datasets???

  5. Based on practical principles, Policy defines when in a workflow a PID is created as well as other curation activities..Thesedefs are linked Integrate Concepts: Policy-based Digital Data Management Concept Graph (Reagan Moore) Purpose DATA_ID DATA_REPL_NUM DATA_CHECKSUM Collection Defines SubType Replication Policy Has Isa Isa Isa Has Isa Sharing Publication Preservation Checksum Policy Digital Object Attribute Has Isa Quota Policy Has Isa Defines Data Type Policy Isa Integrity Updates Isa Isa Persistent State Information Authenticity Property Policy Procedure Isa Defines Updates Controls Access control Isa Isa SubType Has HasFeature GetUserACL Periodic Assessment Criteria Policy HasFeature Workflow Isa Policy Enforcement Point SetDataType Completeness HasFeature Chains Isa SetQuota Correctness Isa Function HasFeature Invokes Isa DataObjRepl Consensus Isa Isa SysChksumDataObj Operation Consistency Client Action

  6. Based on DF Discussions we developed suggested concepts with candidate terminology: Examples • Data practice is the actual application/ use of ideas & methods (as opposed to theories) about how data are collected, created, stored (maintained), curated, used, shared and released (disseminated). • Data principles are rules that provide guidance across data management and use for such things as” data acquisition, data lifecycle control, data policy & ownership, metadata practices, data quality etc. • Common data solutions are agreed upon, easily available, tested & approved approaches to widely occurring problems in data management and use • Data discovery is a process of query and/or search to find (research) data of interest. • Database cracking features incremental partial indexing and/or sorting of the data. It combines features of automatic index selection and partial indexes. • It reorganizes data within the query operators, integrating the re-organization effort (occasionally invoking creation or removal of indexes on tables and views based on use) into query execution. • It shifts the cost of index maintenance from updates to query processing. • Adaptive indexing is characterized by the partial creation and refinement of preliminary or fixed DB indexes as side effects to support efficient query execution. (after http://www.vldb.org/pvldb/vol4/p586-idreos.pdf)

  7. Now we have a new, long list of terms to discuss • For example, “searchable” • what makes (data, publication etc. ) searchable? • Rich metadata, use of a standard vocabulary, use of a registry etc... • Some terms on our list have relevant RDA groups • Metadata (e.g rich metadata etc.), • Data publishing workflow (e.g. workflow), • Domain repository, • Repository Platforms for Research Data IG, • Active Data Management Plans IG, • BioSharing Registry: connecting data policies, standards & databases in life sciences WG • Practical Policy (follow on) ? • etc. • Some (general) terms we can leverage standards organizations & bodies (NIST, ISO etc.) • System, architecture, actor, service, schema, protocols, layer, physical layer, re-usable • Some we may have particular advocates for (Research Object, self documentation- etc.)

More Related