1 / 37

Towards a Digital Library Theory: A Formal Digital Library Ontology

Towards a Digital Library Theory: A Formal Digital Library Ontology. Marcos Andr é Gonçalves, Layne T. Watson, and E dward A. Fox Virginia Tech, Blacksburg, VA 24061 USA, fox@vt.edu (For ACM SIGIR Mathematical/Formal Methods in Information Retrieval, MF/IR 2004, Sheffield, UK, Aug. 29, 2004).

chanel
Télécharger la présentation

Towards a Digital Library Theory: A Formal Digital Library Ontology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards a Digital Library Theory: A Formal Digital Library Ontology Marcos AndréGonçalves, Layne T. Watson, and Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA, fox@vt.edu (For ACM SIGIR Mathematical/Formal Methods in Information Retrieval, MF/IR 2004, Sheffield, UK, Aug. 29, 2004)

  2. Outline • Background: The 5S Model • Motivation for this Work • Digital Library Formal Ontology • Taxonomy of DL Services • Applications of the Theory • Conclusions and Future Work

  3. Background: The 5S Model • Why 5S? • DLs are not benefiting from formal theories as have other CS fields: DB, IR, PL, etc. • DL construction: difficult, ad-hoc, lacking support for tailoring/customization • Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. • Lack of specific DL models, formalisms, languages

  4. Background: The 5S Model • Informally, DLs can be defined as complex information systems that: • help satisfy info needs of users (societies) • provide info services (scenarios) • organize info in usable ways (structures) • (re)present info in usable ways (spaces) • communicate info with users (streams)

  5. Background: The 5S Model

  6. Background: 5S and DL formal definitions and compositions (April 2004 TOIS)

  7. Background: The 5S Model • Summary of TOIS 2004 Formal Definitions: • A digital library is a 10-tuple (Streams, Structs, Sps, Scs, St2, Coll, Cat, Rep, Serv, Soc) in which • Streams is a set of streams, which are sequences of arbitrary types (e.g., bits, characters, pixels, frames); • Structs is a set of structures, which are tuples, (G, ), where G= (V, E) is a directed graph and : (V  E) L is a labeling function; • Sps is a set of spaces each of which can be a measurable, measure, probability, topological, metric, or vector space.

  8. Background: The 5S Model • Scs = {sc1, sc2, …, scd} is a set of scenarios where each sck = <e1k({p1k}), e2k({p2k}), …, ed_kk({pd_kk})> is a sequence of events that also can have a number of parameters {pik}. Events represent changes in computational states; parameters represent specific locations in a state and respective values. • St2 is a set of functions : V Streams () that associate nodes of a structure with a pair of natural numbers (a, b) corresponding to a portion (span/segment) of a stream. • Coll = {C1, C2, …, Cf} is a set of DL collections where each DL collection Ck = {do1k, do2k, …, dof_kk} is a set of digital objects. Each digital object dok = (hk, Stm1k, Stt2k, k) is a tuple where Stm1k Streams, Stt2k Structs, k St2, and hk is a handle which represents a unique identifier for the object.

  9. Background: The 5S Model • Cat = {DMC_1, DMC_2, …, DMC_f} is a set of metadata catalogs for Coll where each metadata catalog DMC_k = {(h, msshk)}, and msshk = {mshk1, mshk2, …, mshkn_hk} is a set of descriptive metadata specifications. Each descriptive metadata specification mshki is a structure with atomic values (e.g., numbers, dates, strings) associated with nodes. • A repository Rep = {(Ci, DMC_i)} (i=1 to f) is a set of pairs (collection, metadata catalog); it is assumed there exist operations to manipulate the family of pairs (e.g., get, store, delete). • Serv = {Se1, Se2, …, Ses} is a set of services where each service Sek = {sc1k, .., scs_kk} is described by a set of related scenarios. • Soc = (C, R) where C is a set of communities and R is a set of relationships among communities. SM = {sm1, sm2, …, smj}, and Ac = {ac1, ac2, …, acr } are two such communities where the former is a set of service managers responsible for running DL services and the latter is a set of actors that use those services. • Being basically an electronic entity, a member smk of SM distinguishes itself from actors by defining or implementing a set of operations {op1k, op2k, …, opnk}  smk

  10. Background

  11. Motivation • Previous definitions emphasize syntactic aspects, i.e., how digital library concepts are composed or built from previously defined concepts. • Complete a formal DL theory by: • Making explicit the implicit relationships that exist among the DL formal concepts defined in [Gonc04] • Providing set of axiomatic rules that precisely define and constrain the semantics of the relationships • Categorizing and classifying DL services on the basis of the ontology • Research questions • How should DL services be built from the other DL components • Which are the fundamental and elementary DL services ? • How can services be built/composed from other DL services? • We will explore semantic relations and rules of the DL domain by using ontologies.

  12. Digital Library Formal Ontology • An ontology is a tuple  = (Ontol_Concepts, Ontol_Rels) where: • Ontol_Concepts is a family of ontological concepts, • Ontol_Rels is a family of relations. • Relations in Ontol_Rels are operationally realized by one or more rules (e.g., first-order logic axioms) which intentionally specify or constrain which elements of a concept can participate in a relation. • Ontol_Rules is a family of rules of a particular ontology.

  13. Digital Library Formal Ontology • Relationships • Intra-Model • Video contains Audio (MM) • Metadata Catalog describes Collection (LIS) • Probabilistic Space is_a Measure Space • Service extends Service (reuse) • Service Manager inherits_from Service Manager (OO) • Inter-Model • Event executes Operation • Actor participates_in Scenario • Service Manager runs Service • Service employs/produces Streams  Structures  Spaces

  14. Digital Library Formal Ontology • Concepts: {Se, Sc, e}; Key: Se = service; Sc = scenario; e = event. • Relations: • contains Sce • Symbolic Rule.  x, y (x contains ySc(x) e(y) j: (j x.Dom y = x(j)) ) • precedes eeSc;happens_before eeSc • Symbolic Rule 1. x, y, z (x precedesz y e(x) e(y) Sc(z)  i, j: (z contains x  z contains y  x = z(i)  y=z(j)  i + 1 = j)) • Symbolic Rule 2. x, y, z (x happens_beforez y e(x) e(y) Sc(z)  i, j: (z contains x  z contains y  x = z(i)  y=z(j)  i < j)) • includes SeSeScSc; extends SeSeScSc • Symbolic Rule 1.x, y (x includes y Sc(x) Sc(y)  (z: e(z)  y contains z  x contains z)  (p, q: e(p) e(q)  p precedesy q  p precedesx q)) • Symbolic Rule 2.x, y (x extendsy Sc(x) Sc(y)  (z: e(z)  y contains z  x contains z)  (p, q: e(p) e(q)  p happens_beforey q  p happens_beforex q)) • Symbolic Rule 3. x, y (x extends y Se(x) Se(y)  y  x  (x y p, q: Sc(p) Sc(q)  p  x  q  y  p extends q))

  15. Digital Library Formal Ontology

  16. Digital Library Formal Ontology • Consistency Rules • Catalog-Collection • A complete catalog has at least one set of metadata specifications for each digital object in the collection it describes (surjective partial function). • In a consistent catalog, each set of metadata specifications describes (exactly) one digital object in the related collection (total function). • Scenarios-Society • A scenario x is consistent with regards to a set of service managers Y if each operation executed by each event in the scenario is defined in some service manager y  Y.

  17. Digital Library Formal Ontology • Characterizing employs/produces relationships • In the table each service is characterized by • parameters (input, output) • of the initial and final events • of the scenarios that compose those services • All other previous definitions and keys apply here. • That set is complemented with the following definitions:

  18. Services Related Definitions • Aquery q is the representation of user interest or information need. • Hyptxt is an hypertext; wherein an anchor is a node. • A log_entry is a descriptive metadata specification about an event of a scenario. • Let {doi} = {doi1, doi2,…, doin } be a set of digital objects and Ct = {c1, c2,…,cn} be a set of labels for categories. A classifier classCt: {doi}  2Ct is a function that maps a digital object to a set of categories. • A cluster cluk = {do1k, do2k, …, donk} is a subset of a set of digital objects.

  19. Applications: A Taxonomy of DL Services • Infrastructure Services: dealing with basic concepts such as collections and catalogs • Repository-Building: create collections (digital objects) and/or catalogs (metadata specifications). • Preservational: generate instances by copying collections (digital objects) or transforming (converting/translating) objects into different formats for preservation purposes • Add_Value: either aggregate value/information to collections (digital objects) or connect objects together. • Information Satisfaction: dealing with higher level societal requirements • KEY in next slide: • Fundamental: minimal set of services or essential to existence of a DL • Composite DL service: takes input from some other service; otherwise the service is called elementary.

  20. Applications: A Taxonomy of DL Services

  21. Application: A Taxonomy of DL Services

  22. DL Services I/O Behavior • Regarding the prior figure, which shows: • Instantiations of the “Services Definition” model • Inputs and outputs of examples of infrastructure and information satisfaction DL services • Key: • CDL = Collection • ICDL = index for collection CDL • {doi} = digital object • Soc = Society

  23. Applications: A Taxonomy of DL Services

  24. Application: Defining Quality in Digital Libraries • Formal theory can help to define “what’s a good digital library” by: • Formally defining metrics of quality for each formal concept (and relationships) • Helping defining and applying numerical measures to these metrics • Consider this in the Information Life Cycle

  25. Defining Quality in Digital Libraries

  26. Defining Quality in Digital Libraries • Metadata specifications and metadata format - completeness • Completeness of metadata specifications refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not. •  Metric • Completeness(msx) = 1 - (no. of missing attributes in msx/ total attributes of the schema to which msx conforms)

  27. Defining Quality in Digital Libraries • Metadata specifications and metadata format - completeness • OCLC NDLTD Union Catalog

  28. Defining Quality in Digital Libraries • Services - Extensibility and Reusability • A service Yreuses a service X if the behavior of Y incorporates the behavior of X. • A service Y extends a service X if it subsumes the behavior of X and potentially includes additional subflows of events. • Metrics • Macro-Reusability(Serv) = ( reused(sei), sei  Serv)/ |Serv|, where reused is a 1, if  smj, sej reuses si; 0, otherwise. • Micro-Reusability(Serv) = ( LOC(smx) * reused(sei), smx SM, sei Serv, sex runs sei )/ |LOC(sm), sm  SM|, where LOC corresponds to the number of lines of code of a service manager

  29. Defining Quality in Digital Libraries • Services - Extensibility and Reusability Macro-Reusability = 3/16 = 0.187 Micro-Reusability = 3630 / 11910 = 0.304

  30. Application: Re-engineering a DL Specification Language • 5SL: Specification Language • Reengineering • Using the relationships to redefine/reorganize the semantics and organization of the XML elements within the several sections of the DL specification

  31. Re-engineering a DL Specification Language

  32. Re-engineering a DL Specification Language

  33. 5SLGen: Automatic DL Generation

  34. Conclusions and Future Work • Presented a DL formal ontology which specifies the semantics of the relationships among the DL concepts therefore completing a theory for DLs • Applied the resulting ontology to: • Define a taxonomy of DL services • Create a Quality Model for DLs • Re-engineer a DL specification language

  35. Conclusions and Future Work • Future Work Include: • Including Pre- and Post-Conditions in the Service Behavior Analysis • New Applications of the Model/theory • New Design and Generation Tools • Quality tools • Modeling Complex Heterogeneous/Integrated Systems • Archaeology (ETANA) • Develop theorems and proofs • Writing books…

More Related