1 / 56

Subject Mediation for Integrated Access to Heterogeneous Information Sources

Subject Mediation for Integrated Access to Heterogeneous Information Sources. ADBIS’2001 L. A. K alinichenko Institute o f Informatics Problems Russian Academy of Science. Various forms of compositions are studied, e.g. :

suki
Télécharger la présentation

Subject Mediation for Integrated Access to Heterogeneous Information Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SubjectMediation forIntegrated Access toHeterogeneous Information Sources ADBIS’2001 L. A. Kalinichenko Institute of Informatics Problems Russian Academy of Science

  2. Various forms of compositions are studied, e.g. : Interoperable compositions of pre-existing components for IS design; Compositions of heterogeneous information collections; Workflow compositions; Type compositions in database operations over object collections; Heterogeneous mediators compositions. Web site of the group: http://www.ipi.ac.ru/synthesis/ Laboratory for compositional information systems development

  3. Subject Domain Mediation Mediators’ Projects: Brief Overview Query Planning Methods Infrastructure of the mediator aiming at semantic interoperability of collections Summary Talk outline

  4. Subject Domain Mediation • Outline : • Objectives of information integration • The mediator’s concept • Mediator’s classes • Consolidation of a mediator • Advantages of the subject domain mediation approach

  5. Web Search Engines • 1 billion Web pages. • Search engines remain to be the main mechanism to access pages. Key words queries. Dozens of general purpose search engines, thousands of specialized engines (regional, thematic, corporal). • The following kinds of general purpose Web search engines can be distinguished: • basic engines: AltaVista, HotBot, Infoseek, Lycos, WebCrawler, Yahoo, Rambler, Яndex, etc. • portals: Skworm, Proteus, Instantseek, etc. • metasearch engines: SavvySearch, Inference Find, ProFusion, etc. • metasearch utilities: Copernic, BeeLine, SearchPad, etc. • “Metasearch” engines provide for requesting several search engines and composing combined response. It is assumed that such response more probably will contain relevant information. • Precision of search is very low (uncontrollable use of terms for indexing and search). This is unavoidable payment for simplicity of home pages “registration” for the whole Web.

  6. What is the required level of information integration/dissemination • Just putting information on the Web (creating a homepage, a Web site) • Inserting a description of a resource into a suitable Digital Library (e.g, into NCSTRL, the Networked Computer Science Technical Report Library, a collection of institutional and archival CS research reports and papers) • Using subject gatewaysfor easier access to networked information resources ina defined subjectarea. Subject gateways work as intermediaries • Applying a community-oriented digital library(a collection of documents built by a community of users which aims at observing or studying a phenomenon (e.g., in a context of a certain area)). • Using heterogeneous multidatabase systems. • Applying subject mediators to support representation and access to various subject domains. Mediators should provide modelling facilities and methods for conversion of unorganized, nonsystematic population of collections registered by different collection providers into a well-structured set of sources supported by the integrated uniform specifications. Metainformation. Systematic registration of collections.

  7. What and for what is to be integrated • What kind of information is to be supported • structured, object, semi-structured, textual, multimedia • What kind of metainformation is needed • thesauri, classifiers, vocabularies, ontologies, schema definitions (data, objects, functions, workflows) • What to disseminate: • 1. A document (paper) as a whole using additional document description • 2. A document in XML • 3. Content of a document • What to retrieve • 1. To discover individual resources (Web pages, documents, papers) • 2. To retrieve information relevant to a specific query contained in a collection of resources • 3. To retrieve information as workflows, methods and/or data and use various compositions of those • 4. To provide for interoperability of the information sources in process of problem solving: • technical level of interoperability • semantic interoperability

  8. Digital repositories of knowledge Digital repositories of knowledge in certain areas can be implemented, like: Digital Earth, Digital Sky, Digital Bio, Digital Law, Digital Art, Digital Music. Examples of Microsoft TerraServer, Multi-Terabyte Astronomy Archives are widely known. An example: DigiTerra (an Environmental Digital Library, Rutgers) objective is to provide continuous land monitoring, fire detection, water and air quality testing, urban planning, as well as supporting research and instructional activities in related areas of science. Vast array of environmental data collected in DigiTerra should include images from a variety of space-borne satellites, ground data from continuous monitoring weather stations, maps, reports and data sets from federal, state and local government agencies, and serve diverse user community.

  9. The Mediator Concept • The mediator architecture (Wiederhold, 1992) deals with the problem of integration of heterogeneous information. The sources are "heterogeneous" on many levels: • data model and types of data used; • the underlying data units (salaries could be stored on a per-hour or per-month basis); • behavior of objects involved; • the underlying concepts. A payroll database may not regard a retiree as an employee, while the benefits department does. Conversely, the payroll department may include consultants in the concept "employee" while the benefits department does not; • the schema that the information may conform cannot be rigid in advance. Examples of "semi-structured" information include that found in XML documents, repositories used in the Human Genome Project, Lotus NOTES. • Mediator is to provide a uniform query interface to the multiple data sources, thereby freeing the user from having to locate the relevant sources, query each one in isolation, and combine manually the information from the different sources.

  10. Mediation approaches • integration information from pre-selected sources according to the predefined information needs. A procedural approach is known (TSIMMIS, Squirrel, WHIPS) to integrate information from sources through ad-hoc procedures. When information needs or sources change, a new mediator should be generated. This is known as Global as View (GAV) approach. • integration information from arbitrary sources according to the predefined information needs. A declarative approach is known (Carnot, SIMS, Information Manifold, Infomaster). Mediators contain mechanisms to rewrite queries according to source descriptions. A rewritten query should be contained in the original query. This is known as Local as View (LAV) approach. • combined LAV and GAV approaches (GLAV)

  11. Mediator Definition as Subject Metainformation Consolidation For the mediator's scalability two separate phases of the mediator's functioning are distinguished: consolidation phase and operational phase. On the consolidation phase the efforts of the scientific community are focused on the mediator subject definition by declaring its metainformation. It is assumed that the top level researchers are involved in this process. The metainformation defined at the consolidation phase is assumed to be conservative for a certain period of time when it can only be extended. The well-known, representative collections of information in the subject domain are used during the process of metainformation definition. The metainformation created at the consolidation phase constitutes the federated level of the mediator. During the operational phase arbitrary information collections can be registered at the mediator expressed in terms of the federated level. Process of the registration is autonomous and can be done by collection providers independently of each other. Users of the mediator know only the metainformation defining the mediator’s subject and formulate their queries in terms of the mediator’s subject. For a query the mediator decides what registered collections are relevant to the query.

  12. Subject Mediator. Cultural Heritage Collections. Federated Level Metainformation Heritage_Entity «type» «type» Text created_by*date*narrative* idintifier* relation*…place_of_originhistory_periodcontentorigin_historyin_collectionowned_by digital_form ... contains near withinfollows… Painting Sculpture Antiquities Person Thesauri: Creator Collector Owner Cultural Heritage Repository History Jurisdiction Museum Gallery Exhibition

  13. Subject Mediator. Cultural Heritage Collections. Collections Registration Federated Level Metainformation Local into Federated Level Mapping CIMI Profile of z39.50 Louvre Museum Web Site Uffizi Museum Web Site museum_object creator_c department author artist canvas created_by*date_collected*description* object_id* relation*…content_general collection mrObject nationalityworks namedescriptionsections namenationalityworks namebiographypaint_list titlepainterdatehistorydescriptionto_image Local Views in Terms of Federated Classes creator_c(c/Creator_Creator_Info [name, nationality, date_of_birth, date_of_death, works/{set_of:Heritage_Entity_Museum_Object}])  creator(c[name, nationality, date_of_birth, date_of death, works]) author (a/Creator_Author[name/fname, nationality, works/{set_of:Heritage_Entity_Work}]) creator(name, nationality, works (w)) &  c,s ( repository (c/Collection [contains(s/Section)]) & repository.name = ‘Louvre’ & in (w, s.contains) ) artist(a/Creator_Artist[name, nationality, general_info/Text_Textual, works/{set_of: Painting_Canvas}]) creator(a[name, nationality, general_info, works]) & repository (n/name, collection) & n = ‘Uffizi’ &  col/Collection (  isempty (intersect (collection(col/ Collection).contains, works)))

  14. Subject Mediator. Cultural Heritage Collections. Query Planning Find digital images of Italian paintings of Renaissance containing a drawing of Madonna with a child User {i/Image |  p/Painting, d/Digital_Entity, re/Rendition ( creator( nationality, works(p.digital_form(d).rendition(re).resource(i/Image)) & nationality = ‘Italy’ & p.content.contains(‘Madonna with a child') & p.history_period = ‘Renaissance’ } Mediator Thesaurus extension may add ‘Virgin Mary’, ‘God Mather’ Query Planner Thesaurus {i/Image |  o/Heritage_Entity_Museum_Object, d/Digital_Object, re/Rendition (creator_c(nationality,works(o)) & nationality = ‘Italy’ & o.history_period = ‘Renaissance’ & o.content.contains(‘Madonna with a child OR …’) & in(i, o.digital_object(d).rendition(re).resource))} {i/Image |  w/Heritage_Entity_Work (author(nationality, works (w)) & nationality = ‘Italy’ & w.history_period = ‘Renaissance’ & w.description.contains(‘Madonna with a child OR … ') & in (i, w.to_image)} {i/Image |  r/Collection_Room, p/Painting_Canvas (artist(nationality, paint_list(p), room_list (r) ) & in(p, r.paint_list)) & nationality = ‘Italy’ & p.history_period = ‘Renaissance’ & p.description.contains(‘Madonna with a child OR … ') & in(i, p.to_image) CIMI Louvre Uffizi

  15. Advantages of subject domain mediation 1. Subject mediation makes possible to reach semantic integration of heterogeneous information collections 2. Users should know only subject definitions that contain concepts, structures and methods as defined by the community 3. Information providers can disseminate their information for integration independently of each other and at any time. To disseminate they should register their information at the subject mediator. Users should not know anything about the registration activity. 4. Autonomous information collections contexts, data model and languages used, implementation platforms are absolutely independent on the mediator and its consolidated metainformation definitions 5. Querying the subject definitions, users have integrated access to all information registered at the mediators up to the moment of a query. 6. Mediators form recursive structure: each mediator can be registered at another mediator. Thus, multiple subjects can be semantically integrated defining mediators of the higher level. 7. Personalization providing convenient views for specific groups of users can be formed above the subject definitions. This process is independent of the existing collection and their registration.

  16. Disadvantages of subject mediation 1. Providing a subject definition requires that a proper level of maturity and organization of scientific community have to be reached (e.g., are the research and development groups in the area sufficiently open, collaborative and motivated). Subject consolidation is a collective, organized effort of the community. 2. Process of registration is not an easy one and requires specific supporting tools.

  17. Mediator’s Recursion Query mediator Data from mediator Register mediator (as collection) Mediator Query collection Data from collection Register collection

  18. Mediators’ Projects: Brief Overview • Outline : • TSIMMIS (Stanford) • Information Manifold (Univ. of Washington) • GARLIC (IBM) • InfoSleuth (MCC) • XML as a middleware model

  19. TSIMMIS (The Stanford-IBM Manager of Multiple Information Sources) In TSIMMIS mediators are built above a GIVEN set of sources with wrappers that export OEM self-describing objects. OEM (Object Exchange Model) is used as a unifying data model. The mediators considered provide integrated OEM views of the underlying information (e.g., if a relational source is considered, it is exported as a set of OEM objects.) Mediators are specified with MSL (Mediator Specification Language) that can be seen as a view definition language and is a logic-based object-oriented language targeted to OEM. Variables in MSL may refer only to existing sets. In absence of negation MSL can be viewed as a variant of Datalog. A query consists of rules using <object-id label value> as patterns. To describe a mediator in MSL, one gives logical rules that define the OEM objects that the mediator makes available in a view. Wrappers are specified with WSL that is an extension of MSL to allow for the description of source contents and querying capabilities

  20. Information Manifold In the Information Manifold a reasoning phase is required for realizing which sources have the data of interest, unlike TSIMMIS where view expansion is all that is needed for finding what data each source must contribute. The user interacts with a uniform interface in the form of a set of global relations (the mediated schema) used in formulating queries. The actual data is stored in external source relations. To answer queries, a mapping between the relations in the mediated schema and the source relations must be specified. A method to specify these mappings is to describe each source relation as the result of a conjunctive query (i.e., a single Horn rule) over the relations in the mediated schema. Given a user query formulated in terms of the relations in the mediated schema, the system must translate it to a query that mentions only the source relations and is a maximally­contained plan. The collection of available data sources may not contain all the information needed to answer a query. The Information Manifold provides uniform access to structured information sources on the WWW.

  21. Source Query Capabilities Representations in Mediation Frameworks Sources express their capabilities in mediation systems through a variety of mechanisms - query templates, capability records, and simple capability-description grammars. Concerning query capabilities, data sources with different and limited capabilities are accessed either by writing rich functional wrappers for the more primitive sources, or by dealing with all sources at a ''lowest common denominator''. Another approach, in which a mediator ensures that sources receive queries they can handle, while still taking advantage of all the query power of the source. Wrappers reflect the actual query capabilities of the underlying data sources, while the mediator has a general mechanism for interpreting those capabilities and forming execution strategies for queries. Capabilities-Based Rewriters (CBR) are basic mechanisms of the mediators to develop a plan for a query taking into account capabilities of the sources.

  22. The GARLIC Approach (IBM Almaden) Heterogeneous and multimedia information systems are main objectives. Only specific data types are supported in multimedia. For example, document retrieval through use of various text indexing and search, spatial searches in GIS, image processing (QBIC, Photobook). One of well-known decision is Illustra's datablades for different data types. Garlic differs in that there is no intention to store everything in one repository - distribution, heterogeneity and integration of heterogeneous sources. Conformance concept of interfaces (interface in a sense of ODMG-93) leads to an interface lattice based on a subtyping. Garlic exploits specific wrapper technology based on source capability specification. Source capabilities are coded by the programmer within the corresponding wrapper. They remain unknown to the optimizer.

  23. InfoSleuth: semantic integration of information in open and dynamic environments • Integration of different technological developments in supporting mediated interoperation of data and services over information networks: • Agent Technology. Specialized agents that represent the users, the information resources, and the system itself cooperate to address the system requirements of the users. Decentralization of capabilities is reached that is the key to system scalability and extensibility. • Domain models (ontologies). Give a concise, uniform and declarative description of semantic information independent of the underlying models. • Information Brokerage. Specialized information agents match information needs (specified in terms of some ontology) with currently available sources. So requests can be routed to the relevant sources. • Internet computing. Java and Java Applets enable deployment of agents at any source of information regardless of its location or platform.

  24. YAT: XML as a middleware model An XML-oriented algebra having optimization properties in a combination with definition of query source capabilities, wrapping more structured query languages (e.g., OQL), new optimization technique for XML-based integration system. Other semistructured/XML systems – TSIMMIS (query templates are used to describe source capabilities) and MIX. However, definition of all possible queries according to a schema is not feasible with such templates. YAT operational model and algebra. XML data (like objects) can be arbitrarily nested. A technique similar to OO is adopted. For an arbitrary XML structure an operator Bind is applied whose function is to extract relevant information and produce a Tab structure (comparable to non 1NF relation). To these Tab structures classical operators like Join, Select, Project, etc. can be applied. Bind operator: input tree, given filter (a tree with distinct variables). Produces a table that contains the variable bindings resulting from the pattern matching. It is expensive to evaluate, but it can be rewritten into more simpler operations. Tab operator: applied to Tab structures and returns a collection of trees conforming to some input pattern.

  25. Query Planning Methods for Mediators of Heterogeneous Information Sources • Outline : • Query Planning for LAV approach • Query Containment Techniques • Wrapper generation

  26. Representation of Information Sources Formally, the contents of an information source are described by a pair (or set of pairs) of the form (v, rv ) where v is a class name with mv state attributes, and rv is a formula of the form: rv = U p1 (U 1) &…& pn ( Un ) The formula rv has mv distinguished variables. The pi 's are any of the classes on the federated level. The class name v is a new name describing an information source. This means that the source can be asked a query of the form v(Z) (or any partial instantiation of it), and returns instances with mv state attributes that satisfy the following implication:  Z (v( Z)) => rv(Z)) Simplified source capability model (input bindings, output, selections): R1(Y1, ... , Yk):- R(X1, ... , Xm), 1 = a1, ... , n = an, = Y1, ... , k = Yk, 1, ... , h

  27. Sound and Relevant Query Plans A simplified query Q to the mediator can be represented as a conjunction: Q(Y) :  X p1 (X1) & … & pn (Xn ); X , X 1 , … , Xn are tuples of variables or constants and the pi 's are any of the classes on the federated level. The answer to the query is the set of bindings that can be obtained for the variables in Y. Given a query of the form above, the query processor generates a set of conjunctive plans for answering Q(Y) as formulae of the form: Q(Y):  U v1 (U1) & … & vk (Uk ) & Cp where each of the vi 's is a class name associated with an information source, and Cp is a conjunction of atoms of order relations. Note that the distinguished variables in the plan are the same as the ones in the query. Given a conjunctive plan P , the descriptions of the information sources imply that the following constraints hold on the answers it produces: (recall that rvi is the formula describing the constraints on the instances found in vi ) ConP : rv1 (U1) & … & rvk (Uk) & Cp

  28. Sound and Relevant Query Plans Definition: A conjunctive plan P is soundif all the answers it produces are guaranteed to be answers to the query, i.e., if the following entailment holds: Y (ConP) => X p1(X1) & … & pn (Xn) Several conjunctive plans to answer a query are required because the information sources are not complete. Definition: A conjunctive plan P is relevant to a query Q(Y) :  X p1(X1) &…& pn (Xn ) if the sentence  Y,X (Conp & p1(X1) & … & pn(Xn)) is satisfiable.

  29. Plan Generation First step: separately for each subgoal in the query, compute which information sources are relevant to it and collect such sources into respective buckets. An information source is relevant to a subgoal g if, the description of the source contains a subgoal g1 that can be unified with g, such that after the unification, the constraints in the query and the constraints in the source description are mutually satisfiable. ‘Satisfiable’ means that the conjunction of built-in atoms should be satisfiable and there are no two subgoals C(x) and D(x) where C and D are disjoint classes. ‘Mutually satisfiable’ means that if C(Q) and C(U) are the conjunction of constraint subgoals in query and source, then C(Q) & C(U) should be satisfiable. Second step: conjunctive plans constructed are analyzed by choosing one relevant source for every subgoal in the query, and check each plan for soundness and relevance. Specifically, it is considered every conjunctive plan Q1 of the form Q1(Y) : ( U) v1(U1) & … & vn(Un) where vi(Ui) has been deemed relevant to subgoal pi in the query. Each such conjunctive plan should be checked that it is (1) relevant, (2) sound (if it is not a sound plan, it is checked whether it can be made sound by adding conjuncts of order predicates), and (3) minimal (i.e., we cannot remove a subgoal from the plan and still obtain a sound plan).

  30. Plan Generation Usually these properties are checked using algorithms for containment of conjunctive queries. The algorithm should guarantee to produce only sound and relevant plans. Whether the algorithm produces all the necessary conjunctive plans ? The answer is based on the close relationship between the problem of finding conjunctive plans and the problem of answering queries using materialized views. The cost of checking minimality and soundness of a conjunctive plan is exponential, it is exponential only in the size of the query, which tends to be small, and not in the number of information sources or their contents.

  31. Query Containment Algorithms • Basic techniques (e.g., QinP (Ullman): Containment of conjunctive queries in logical recursions, negation in conjunctive queries by Chan) • Extensions:  • Containment for queries with complex objects. Typing constraints and integrity constraints for object DB schemas • Relative containment • Conjunctive queries with regular expressions Query containment under constraints • Bag containment of conjunctive queries • Alternative techniques • Counter machines to study query containment • Verification of knowledge bases • Description Logics

  32. Containment of Conjunctive Queries in Logical Recursions (QinP) • An algorithm testing whether a conjunctive query is contained in the relation defined by a logic program. • Given are a conjunctive query Q, represented as: • H :- G1 & … & Gk and a logic program P. • To decide whether Q  P: • 1) Assign to every variable in Q a unique constant. • 2) Form EDB relation from the subgoals of Q. • 3) Evaluate P (bottom-up) as DB relation • 4) If EDB is contained in DB then Q  P

  33. A Query Converter for Wrappers Toolkit • In Tsimmis query converter is a part of the Wrapper implementation toolkit. • MSL logic-based, OEM-oriented query language is used. • Source capabilities are defined with templates in a Query Description and Translation Language (QDTL). Each template can be associated with an action that generates the commands for the underlying source. • The converter will process: • Directly supported queries. These are queries that syntactically match a template. • Logically supported queries. These are queries that produce the same results as a directly supported query. The notion of logical equivalence is used to detect queries that fall in this class. • Indirectly supported queries. These are queries that can be executed in two steps: first a directly supported query is executed, and then a filter is applied to the results of the first step.

  34. A query qs is a maximal supporting query of query q with respect to capability description if qs is directly supported by d, qs indirectly supports q1, and there is no directly supported query q’s that indirectly supports q1 , is subsumed by qs, and is not logically equivalent to qs There may be more than one maximal supporting query for a given query. Capability description D is expressed as a (possibly recursive) Datalog program. The problem of determining if a description D supports query Q, is the same as the problem of determining if program P(D) contains (subsumes) query Q and if a corresponding filter query exists.A supporting query is found in two steps: 1. find a subsuming query, and 2. find the corresponding filter. The approach is based on the extended Ullman query containment algorithm (X-QinP) that gives yes/no answer to the containment question. The algorithm is extended to find the actual maximal supporting queries and also the native query constituents for the underlying source. Detection of maximal supporting query and of a filter

  35. Conjunctive queries Source templates with binding patterns Recursive queries Views in description logic Rewriting for semistructured data. Regular expressions rewriting, navigational plans Boolean queries rewriting Queries with union and aggregation Type inferencing Object fusion Scalable technique Known modifications of query rewritingalgorithms using views

  36. Infrastructure of the mediator aiming at semantic interoperability of collections • Outline : • Heterogeneity of the mediator • Canonical information model • Mediator’s metadata • Information extraction framework • Collection registration at a mediator as a process of compositional development

  37. Heterogeneous information models absorbed by the canonical model Canonical Model Core Extensions is_refined_by Component Models (IDL, CDL, BOF) Semistructured Data Models (OEM, ADM, OQL-doc) Object & Heterogeneous DB Models (ODL, SQL3, Garlic) Document Object Model Knowledge Base Representations (OKBC, Ontolingua) Metadata for DL (Dublin Core, Warwick, Starts, Z.39.50) Unstructured Data (vocabularies, thesauri) Metadata Expressible in Meta Models (MOF, RDF) Workflow Models

  38. Canonical Model Entities instance_of Metaclass type instance type instance_of instance instance type instance_of supertype superclass type Class Type instance type Collection World type type instance_of instance_of instance_of instance_of instance_of Object Abstract Value Frame becomes an object

  39. Canonical Information Model • A set of the canonical model facilities used for the uniform representation of the information resources includes the following: • Frame representation facilities. Frames are treated as a special kind of abstract values introduced mostly for description of concepts, terminological and weakly-structured information. All specifications in canonical model have a form of frames that become a part of the metabase. • Unifying type system. A universal constructor of arbitrary abstract data types as well as a comprehensive collection of the built-in types are included into a type system. • Class representation. Classes provide for representing of sets of homogeneous entities of an application domain. Class instances (objects) have specific types. • Multiactivity (workflow) representation. These are used for the specification and implementation of interconnected and interdependent application activities, for the specificaton of declarative assertions and concurrent megaprograms over the information resources. • Facilities for the logical formulae expressions. A multisorted object calculus (typed first-order language) is used for querying the integrated set of digital collections as well as for specification of constraints and behaviour.

  40. Mediator’s Metadata Layering Personalized DL Level Specific Vocabulary Views Subschemas Subject Classification Hierarchy & Context (metaclass hierarchy & ontological definitions) Interoperable (Federated) Level Federated Schema Common Thesauri Core Extension Ontology Ontology Ontology Local Level Schema Schema Vocabulary Vocabulary Thesauri Real Collection Level Structured Collection Schema Semistructured Collection Unstructured Collection Vocabulary/Thesauri

  41. Information Extraction Framework Personalization Facilities Canonical GUI Personalized DL Personalized DL Java / CORBA Graphical Query Facilities Query Engine Outcome Presentation • canonical mediator’s query language • best relevant collection identification • query decomposition • query planning and monitoring Information Extraction Facilities • ranking • merging • aggregation • summarization Mediator’s DBMS(object-relational DBMS) metadata repository data Localization Facilities XMLwrapper Z39.50wrapper information retrievalsystem wrapper SRS wrapper http Z39.50 IIOP http Local Collections XML data system Z39.50 server information retrieval system molecular biology data banks

  42. Metainformation Repository

  43. Collection Registration Framework • The framework facilities are intended to support functions of collection contextualizing: • constructing mapping of a collection data model and metadata into the canonical ones; • representation of the new metainformation in terms of the federated mediator's level; • inferring from the collection the required information for the federated level; • semi-automatic construction of a collection wrapper; • connecting the wrapper to the interoperation environment (e.g., CORBA).

  44. Contextualization of Ontology • mapping of local ontological context to that of the mediator • by names and relationships • by natural language description • applying structural integration to concept specifications • introducing new concepts over existing ones • contextualization through structural correlation • establishing weak ontological relevance of specification elements applying analysis of intercontext concept relationships • establishing tight ontological relevance of specification elements introducing a subsumption relationship between concepts

  45. evaluation of descriptor weights • establishing intercontext relationships between concepts Correlation of Ontological Concepts

  46. Ontological Metainformation

  47. Process of an Information Source Registration • For each source class the following steps (of the compositional development process) are required [LNCS 2151]: • relevant federated classes identification • Find federated classes that ontologically can be used for defining source class extent in terms of federated classes. To a source class several federated classes may correspond covering with their instance types different reducts of an instance type of the source class. On another hand, several source classes may correspond to one federated class. • most common reducts construction • For an instance type of each identified federated class do: • Construct most common reducts for instance type of this federated class and source class instance type to concretize (partially) such federated instance type. Most common reduct may include also additional attributes corresponding to those federated type attributes that can be derived from the source type instances to support them. • In this process for each attribute type of the common reduct a concretizing type, concretizing function or their combination should be constructed (this step should be recursively applied).

  48. Process of an Information Source Registration • For each source class the following steps are required: • partial source view construction • For each relevant federated class construct a partial source view expressing a constraints in terms of the federated class that should be satisfied by values of respective most common reducts of source class instances. Thus partial views over all relevant federated classes will be obtained. • partial views composition • Construct compositions of the source type most common reducts obtained for instance types of all federated classes involved. • Construct a source view as a composition of partial views obtained above. This is an expression of a materialized view of an information source in terms of federated classes. An instance type of this view is determined by the most common reducts composition constructed above.

  49. Subject Mediator. Cultural Heritage Collections. Collections Registration Federated Level Metainformation Local into Federated Level Mapping CIMI Profile of z39.50 Louvre Museum Web Site Uffizi Museum Web Site museum_object creator_c department author artist canvas created_by*date_collected*description* object_id* relation*…content_general collection mrObject nationalityworks namedescriptionsections namenationalityworks namebiographypaint_list titlepainterdatehistorydescriptionto_image Local Views in Terms of Federated Classes creator_c(c/Creator_Creator_Info [name, nationality, date_of_birth, date_of_death, works/{set_of:Heritage_Entity_Museum_Object}])  creator(c[name, nationality, date_of_birth, date_of death, works]) author (a/Creator_Author[name/fname, nationality, works/{set_of:Heritage_Entity_Work}]) creator(name, nationality, works (w)) &  c,s ( repository (c/Collection [contains(s/Section)]) & repository.name = ‘Louvre’ & in (w, s.contains) ) artist(a/Creator_Artist[name, nationality, general_info/Text_Textual, works/{set_of: Painting_Canvas}]) creator(a[name, nationality, general_info, works]) & repository (n/name, collection) & n = ‘Uffizi’ &  col/Collection (  isempty (intersect (collection(col/ Collection).contains, works)))

  50. Specifications of Types of the Uffizi Site Schema

More Related