380 likes | 610 Vues
Towards a Digital Library Theory: A Formal Digital Library Ontology. Marcos Andr é Gonçalves, Layne T. Watson, and E dward A. Fox Virginia Tech, Blacksburg, VA 24061 USA, fox@vt.edu (For ACM SIGIR Mathematical/Formal Methods in Information Retrieval, MF/IR 2004, Sheffield, UK, Aug. 29, 2004).
E N D
Towards a Digital Library Theory: A Formal Digital Library Ontology Marcos AndréGonçalves, Layne T. Watson, and Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA, fox@vt.edu (For ACM SIGIR Mathematical/Formal Methods in Information Retrieval, MF/IR 2004, Sheffield, UK, Aug. 29, 2004)
Outline • Background: The 5S Model • Motivation for this Work • Digital Library Formal Ontology • Taxonomy of DL Services • Applications of the Theory • Conclusions and Future Work
Background: The 5S Model • Why 5S? • DLs are not benefiting from formal theories as have other CS fields: DB, IR, PL, etc. • DL construction: difficult, ad-hoc, lacking support for tailoring/customization • Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. • Lack of specific DL models, formalisms, languages
Background: The 5S Model • Informally, DLs can be defined as complex information systems that: • help satisfy info needs of users (societies) • provide info services (scenarios) • organize info in usable ways (structures) • (re)present info in usable ways (spaces) • communicate info with users (streams)
Background: 5S and DL formal definitions and compositions (April 2004 TOIS)
Background: The 5S Model • Summary of TOIS 2004 Formal Definitions: • A digital library is a 10-tuple (Streams, Structs, Sps, Scs, St2, Coll, Cat, Rep, Serv, Soc) in which • Streams is a set of streams, which are sequences of arbitrary types (e.g., bits, characters, pixels, frames); • Structs is a set of structures, which are tuples, (G, ), where G= (V, E) is a directed graph and : (V E) L is a labeling function; • Sps is a set of spaces each of which can be a measurable, measure, probability, topological, metric, or vector space.
Background: The 5S Model • Scs = {sc1, sc2, …, scd} is a set of scenarios where each sck = <e1k({p1k}), e2k({p2k}), …, ed_kk({pd_kk})> is a sequence of events that also can have a number of parameters {pik}. Events represent changes in computational states; parameters represent specific locations in a state and respective values. • St2 is a set of functions : V Streams () that associate nodes of a structure with a pair of natural numbers (a, b) corresponding to a portion (span/segment) of a stream. • Coll = {C1, C2, …, Cf} is a set of DL collections where each DL collection Ck = {do1k, do2k, …, dof_kk} is a set of digital objects. Each digital object dok = (hk, Stm1k, Stt2k, k) is a tuple where Stm1k Streams, Stt2k Structs, k St2, and hk is a handle which represents a unique identifier for the object.
Background: The 5S Model • Cat = {DMC_1, DMC_2, …, DMC_f} is a set of metadata catalogs for Coll where each metadata catalog DMC_k = {(h, msshk)}, and msshk = {mshk1, mshk2, …, mshkn_hk} is a set of descriptive metadata specifications. Each descriptive metadata specification mshki is a structure with atomic values (e.g., numbers, dates, strings) associated with nodes. • A repository Rep = {(Ci, DMC_i)} (i=1 to f) is a set of pairs (collection, metadata catalog); it is assumed there exist operations to manipulate the family of pairs (e.g., get, store, delete). • Serv = {Se1, Se2, …, Ses} is a set of services where each service Sek = {sc1k, .., scs_kk} is described by a set of related scenarios. • Soc = (C, R) where C is a set of communities and R is a set of relationships among communities. SM = {sm1, sm2, …, smj}, and Ac = {ac1, ac2, …, acr } are two such communities where the former is a set of service managers responsible for running DL services and the latter is a set of actors that use those services. • Being basically an electronic entity, a member smk of SM distinguishes itself from actors by defining or implementing a set of operations {op1k, op2k, …, opnk} smk
Motivation • Previous definitions emphasize syntactic aspects, i.e., how digital library concepts are composed or built from previously defined concepts. • Complete a formal DL theory by: • Making explicit the implicit relationships that exist among the DL formal concepts defined in [Gonc04] • Providing set of axiomatic rules that precisely define and constrain the semantics of the relationships • Categorizing and classifying DL services on the basis of the ontology • Research questions • How should DL services be built from the other DL components • Which are the fundamental and elementary DL services ? • How can services be built/composed from other DL services? • We will explore semantic relations and rules of the DL domain by using ontologies.
Digital Library Formal Ontology • An ontology is a tuple = (Ontol_Concepts, Ontol_Rels) where: • Ontol_Concepts is a family of ontological concepts, • Ontol_Rels is a family of relations. • Relations in Ontol_Rels are operationally realized by one or more rules (e.g., first-order logic axioms) which intentionally specify or constrain which elements of a concept can participate in a relation. • Ontol_Rules is a family of rules of a particular ontology.
Digital Library Formal Ontology • Relationships • Intra-Model • Video contains Audio (MM) • Metadata Catalog describes Collection (LIS) • Probabilistic Space is_a Measure Space • Service extends Service (reuse) • Service Manager inherits_from Service Manager (OO) • Inter-Model • Event executes Operation • Actor participates_in Scenario • Service Manager runs Service • Service employs/produces Streams Structures Spaces
Digital Library Formal Ontology • Concepts: {Se, Sc, e}; Key: Se = service; Sc = scenario; e = event. • Relations: • contains Sce • Symbolic Rule. x, y (x contains ySc(x) e(y) j: (j x.Dom y = x(j)) ) • precedes eeSc;happens_before eeSc • Symbolic Rule 1. x, y, z (x precedesz y e(x) e(y) Sc(z) i, j: (z contains x z contains y x = z(i) y=z(j) i + 1 = j)) • Symbolic Rule 2. x, y, z (x happens_beforez y e(x) e(y) Sc(z) i, j: (z contains x z contains y x = z(i) y=z(j) i < j)) • includes SeSeScSc; extends SeSeScSc • Symbolic Rule 1.x, y (x includes y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p precedesy q p precedesx q)) • Symbolic Rule 2.x, y (x extendsy Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p happens_beforey q p happens_beforex q)) • Symbolic Rule 3. x, y (x extends y Se(x) Se(y) y x (x y p, q: Sc(p) Sc(q) p x q y p extends q))
Digital Library Formal Ontology • Consistency Rules • Catalog-Collection • A complete catalog has at least one set of metadata specifications for each digital object in the collection it describes (surjective partial function). • In a consistent catalog, each set of metadata specifications describes (exactly) one digital object in the related collection (total function). • Scenarios-Society • A scenario x is consistent with regards to a set of service managers Y if each operation executed by each event in the scenario is defined in some service manager y Y.
Digital Library Formal Ontology • Characterizing employs/produces relationships • In the table each service is characterized by • parameters (input, output) • of the initial and final events • of the scenarios that compose those services • All other previous definitions and keys apply here. • That set is complemented with the following definitions:
Services Related Definitions • Aquery q is the representation of user interest or information need. • Hyptxt is an hypertext; wherein an anchor is a node. • A log_entry is a descriptive metadata specification about an event of a scenario. • Let {doi} = {doi1, doi2,…, doin } be a set of digital objects and Ct = {c1, c2,…,cn} be a set of labels for categories. A classifier classCt: {doi} 2Ct is a function that maps a digital object to a set of categories. • A cluster cluk = {do1k, do2k, …, donk} is a subset of a set of digital objects.
Applications: A Taxonomy of DL Services • Infrastructure Services: dealing with basic concepts such as collections and catalogs • Repository-Building: create collections (digital objects) and/or catalogs (metadata specifications). • Preservational: generate instances by copying collections (digital objects) or transforming (converting/translating) objects into different formats for preservation purposes • Add_Value: either aggregate value/information to collections (digital objects) or connect objects together. • Information Satisfaction: dealing with higher level societal requirements • KEY in next slide: • Fundamental: minimal set of services or essential to existence of a DL • Composite DL service: takes input from some other service; otherwise the service is called elementary.
DL Services I/O Behavior • Regarding the prior figure, which shows: • Instantiations of the “Services Definition” model • Inputs and outputs of examples of infrastructure and information satisfaction DL services • Key: • CDL = Collection • ICDL = index for collection CDL • {doi} = digital object • Soc = Society
Application: Defining Quality in Digital Libraries • Formal theory can help to define “what’s a good digital library” by: • Formally defining metrics of quality for each formal concept (and relationships) • Helping defining and applying numerical measures to these metrics • Consider this in the Information Life Cycle
Defining Quality in Digital Libraries • Metadata specifications and metadata format - completeness • Completeness of metadata specifications refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not. • Metric • Completeness(msx) = 1 - (no. of missing attributes in msx/ total attributes of the schema to which msx conforms)
Defining Quality in Digital Libraries • Metadata specifications and metadata format - completeness • OCLC NDLTD Union Catalog
Defining Quality in Digital Libraries • Services - Extensibility and Reusability • A service Yreuses a service X if the behavior of Y incorporates the behavior of X. • A service Y extends a service X if it subsumes the behavior of X and potentially includes additional subflows of events. • Metrics • Macro-Reusability(Serv) = ( reused(sei), sei Serv)/ |Serv|, where reused is a 1, if smj, sej reuses si; 0, otherwise. • Micro-Reusability(Serv) = ( LOC(smx) * reused(sei), smx SM, sei Serv, sex runs sei )/ |LOC(sm), sm SM|, where LOC corresponds to the number of lines of code of a service manager
Defining Quality in Digital Libraries • Services - Extensibility and Reusability Macro-Reusability = 3/16 = 0.187 Micro-Reusability = 3630 / 11910 = 0.304
Application: Re-engineering a DL Specification Language • 5SL: Specification Language • Reengineering • Using the relationships to redefine/reorganize the semantics and organization of the XML elements within the several sections of the DL specification
Conclusions and Future Work • Presented a DL formal ontology which specifies the semantics of the relationships among the DL concepts therefore completing a theory for DLs • Applied the resulting ontology to: • Define a taxonomy of DL services • Create a Quality Model for DLs • Re-engineer a DL specification language
Conclusions and Future Work • Future Work Include: • Including Pre- and Post-Conditions in the Service Behavior Analysis • New Applications of the Model/theory • New Design and Generation Tools • Quality tools • Modeling Complex Heterogeneous/Integrated Systems • Archaeology (ETANA) • Develop theorems and proofs • Writing books…