190 likes | 275 Vues
TMRA'05 International Workshop on Topic Maps Research and Applications 06.10.2005 Lutz Maicher University of Leipzig maicher@informatik.uni-leipzig.de. Topic Maps Exchange in the Absence of Shared Vocabularies. requesting peer. requested peer. none. ?.
E N D
TMRA'05 International Workshop on Topic Maps Research and Applications06.10.2005 Lutz Maicher University of Leipzig maicher@informatik.uni-leipzig.de Topic Maps Exchange in the Absence of Shared Vocabularies
requesting peer requested peer none ? Subject Proxies are createdin a remote environment. ? A requesting peer requests further information about a Subject in interest. Requested peers send a fragment to requesting peer. Requesting peer has to merge in the requested fragments. requested peer Topic Maps Exchange = Retrieval Task The requested peers have to decide whether a Subject Proxy indicating anidentical Subject is available.
Lutz Maicher (maicher@informatik.uni-leipzig.de) Enterprise Information Integration Quelle: Taylor, John: Thoughts from the Integration Consortium: Enterprise Information Integration: A New Definition, DM Review Online, (9,2004).
Existing Approaches to Topic Maps Exchange • TMRAP – Topic Maps Remote Access Protocol • TMIP – the REStful Topic Maps Interaction Protocol (formerly: Federated Topic Maps) • SHARK(alternatively: Knowledge Port Approach) • TMShare • all of them base on the TMDM • if distributed peers do not use a common vocabulary (PSIs), the exchange fails completely
Semantics in Topic Maps • Topic Maps are a semantic technology ... ...only in the perspective of information integration • „Subject Proxies indicating identical Subjects have to be viewed as merged ones“ • A Subject Map Disclosure (SMD) discloses: • SMD ontology • implies the Subject Indication Approach • Subject Equality Decision Approach • define the semantics of the given Subject Proxies in respect to the functionality of holding the Co-Location objective true • Subject Viewing Approach
Subject Identity is indicated governed by the Subject Indication Approach SMD1 Subject Identity under integration perspective? Subject Equality = both Subject Proxies indicate identical Subjects governed by the Subject Equality Decision Approach SMDi How Subject Equality is detected? Subject Equality SMDi( Subject Indication SMD1 (Subject IdentitySubject Stage1), Subject Indication SMD2 (Subject IdentitySubject Stage2)) Subject Identity integration perspective( Subject Stage1, Subject Stage2)
Subject Equality = both Subject Proxies indicate identical Subjects governed by the Subject Equality Decision Approach SMDi How Subject Equality is really detected? Subject Equality SMDi( Subject Indication SMD1, Subject Indication SMD2, Subject Map Subject Proxy1, Subject Map Subject Proxy2) true | false Subject IndicationSMD1 ? Subject IndicationSMD2 ?
Possible Subject Equality Approaches of a SMD Meaning (semantics) in linguistics referential semantics The meaning of word is defined by the object it refers to. structuralist semantics The meaning of a word is defined by its usage in the language. Referential Subject Equality Approach [A reference to a discrete ‘object’ indicates the intended Subject.] - Subject Proxy 1 indicates its Subject by pointing to it with S1 - Subject Proxy 2 indicates its Subject by pointing to it with S2 - Subject Equality holds if S1=S2 Structuralist Subject Equality Approach [The Subject depends on other Subject Proxies of the Subject Map.] - Subject Proxy 1 indicates its Subject through a set of Subject Proxies s1 - Subject Proxy 2 indicates its Subject through a set of Subject Proxies s2 - Subject Equality holds if s1 = s2 (or S1 similar S2) The different Approaches to Subject Equality define the semantics of the used vocabulary at the time of the Subject Equality Decision.
Absence of Shared Vocabularies Topic Map Processing Application Referential Subject Equality Decision Referential Subject Equality Decision Structuralist Subject Equality Decision Subject Map Disclosure (SMD) Subject Map Disclosure ontology Subject Map ontology Subject Map Vocabulary
Towards a SMDSIM Topic Map Processing Application Referential Subject Equality Decision Structuralist Subject Equality Decision Structuralist Subject Equality Decision Subject Map Disclosure (SMD) Subject Map Disclosure ontology Subject Map ontology Subject Map vocabulary
Subject Similarity Measure (SIM) • SIM – Similarity of the Subject of two different Topics • Procedure: a Subject available in Topic Map TM2 will be requested from Topic Map TM1 • Extract a Topic Map Fragment (F) from TM2 around the Topic representing the Subject • for each pair (T1, T2) from TM1, F • depict the simDNAtype for each pair • calculate the simDNA for each pair • calculate the simDNA twice, by using the detected similarity from the first step • simDNA’(T1,T2) = sum of digits (simDNA(T1,T2)) • Subject Equality (T1,T2) -> (max simDNA’(T1,T2)) and (simDNA(T1,T2))>threshold
simDNAtype simDNAType TMDM (0..*) Source Locator [Locator Item] (0..1) Subject Locator [Locator Item] (0..1) Subject Identifier [Locator Item] (0..*) Topic Names [Topic Name Item] (0..*) Source Locator [Locator Item] (0..1) Type [Topic Item] (0..*) Scope [Topic Item] (1) Value [String] (0..*) Variants [Variant Items] (0..*) Source Locators [Locator Item] (0..*) Scope [Topic Item] (0..1) Value [String] (0..1) Resource [Locator Item] (0..*) Occurrences [Occurrence Item] (0..*) Source Locators [Locator Item] (0..1) Type [Topic Item] (0..*) Scope [Topic Item] (0..1) Value [String] (0..1) Resource [Locator Item] (0..*) rolesPlayed [Association Role Item] (0..1) Type [Topic Item] (1) Parent [Association Item] /x*y*z*w*s*1*2*3*t*n*(o)*[a]*/ x – the current Topic is typing a Topic y – the current Topic is typing an Association z – the current Topic is typing a Topic Characteristics w – the current Topic is typing a Association Role s – the current Topic is scoping a Topic Characteristic 1 – the current Topic has a Source Locator 2 – the current Topic has a Subject Locator 3 – the current Topic has a Subject Identifier t – the current Topic is typed n – the current Topic has a TopicName o – the current Topic has an Occurrence o => /(v|l)t?s*/ (OccDNAtype) a – the current Topic takes part in an Association a => /a(tp)*/ (AssDNAtype)
simDNA – 1. Iteration simDNAType Example simDNAtype(T1) = x13tn x – the current Topic is typing a Topic 1 – the current Topic has a Source Locator 2 – the current Topic has a Subject Locator 3 – the current Topic has a Subject Identifier t – the current Topic is typed n – the current Topic has a Topic Name simDNA(T1,T2) = 01XX1 T2 types an Association T2 has a Source Locator T2 has none Subject Identifier T2 is not typed T2 has a Topic Name, which is not similar simDNA(T1,T3) = 21113 T2 types a Topic T2 has a Source Locator T2 has a Subject Identifier T2 is typed T2 has a Topic Namen, which is a “bit” similar /x*y*z*w*s*1*2*3*t*n*(o)*[a]*/ x – the current Topic is typing a Topic y – the current Topic is typing an Association z – the current Topic is typing a Topic Characteristics w – the current Topic is typing a Association Role s – the current Topic is scoping a Topic Characteristic 1 – the current Topic has a Source Locator 2 – the current Topic has a Subject Locator 3 – the current Topic has a Subject Identifier t – the current Topic is typed n – the current Topic has a TopicName o – the current Topic has an Occurrence o => /(v|l)t?s*/ (OccDNAtype) a – the current Topic takes part in an Association a => /a(tp)*/ (AssDNAtype)
simDNA – 2. Iteration simDNAType Example simDNAtype(T1) = x13tn x – the current Topic is typing a Topic 1 – the current Topic has a Source Locator 2 – the current Topic has a Subject Locator 3 – the current Topic has a Subject Identifier t – the current Topic is typed n – the current Topic has a Topic Name simDNA(T1,T2) = 01XX1 T2 types an Association T2 has a Source Locator T2 has none Subject Identifier T2 is not typed T2 has a Topic Name, which is not similar simDNA(T1,T3) = 21133 T2 types a Topic T2 has a Source Locator T2 has a Subject Identifier T2 is typed, and the typing Topic is similar T2 has a Topic Name, which is a “bit” similar /x*y*z*w*s*1*2*3*t*n*(o)*[a]*/ x – the current Topic is typing a Topic y – the current Topic is typing an Association z – the current Topic is typing a Topic Characteristics w – the current Topic is typing a Association Role s – the current Topic is scoping a Topic Characteristic 1 – the current Topic has a Source Locator 2 – the current Topic has a Subject Locator 3 – the current Topic has a Subject Identifier t – the current Topic is typed n – the current Topic has a TopicName o – the current Topic has an Occurrence o => /(v|l)t?s*/ (OccDNAtype) a – the current Topic takes part in an Association a => /a(tp)*/ (AssDNAtype)
SIM - Example 13n Beispiel.xtm#TMStandards z13n X111 t_source.xtm#t_source Similar: false xx1n Beispiel.xtm#t_person z13n 01X1 t_source.xtm#t_source Similar: false z1n Beispiel.xtm#t_introduction z13n 21X1 t_source.xtm#t_source Similar: false zz1n Beispiel.xtm#t_homepage z13n 21X1 t_source.xtm#t_source Similar: false s1n Beispiel.xtm#t_en z13n X1X1 t_source.xtm#t_source Similar: false s1n Beispiel.xtm#t_de z13n X1X1 t_source.xtm#t_source Similar: false x1n Beispiel.xtm#t_requirements z13n 01X1 t_source.xtm#t_source Similar: false ss1n Beispiel.xtm#t_nickname z13n X1X1 t_source.xtm#t_source Similar: false 13n Beispiel.xtm#t_sort z13n X111 t_source.xtm#t_source Similar: false z1n Beispiel.xtm#t_source z13n 21X3 t_source.xtm#t_source Similar: true y1nnn Beispiel.xtm#at_authorship z13n 01X1 t_source.xtm#t_source Similar: false ws1n Beispiel.xtm#art_author z13n 01X1 t_source.xtm#t_source Similar: false ws1n Beispiel.xtm#art_document z13n 01X1 t_source.xtm#t_source Similar: false 13tnn(vs)(lt)(vts)[atptp] Beispiel.xtm#M1 z13n X111 t_source.xtm#t_source Similar: false 13tnn(lt) Beispiel.xtm#M2 z13n X111 t_source.xtm#t_source Similar: false 12tn(lt)[atptp] Beispiel.xtm#RA1 z13n X1X1 t_source.xtm#t_source Similar: false
SIM - Assessment • Self-Assessment • take each Topic from the Topic Map • create a (randomly pruned) fragment around the Topic Maps, and • request the Topic Map. • pruning probabilities • probType - of the Type of the Topics • probTopNam - of the whole Topic Name • probAss - of the Association the Topic plays a role • probOcc - of a occurrence (and all of its properties)
Data Model(Graph) Syntax Referential Subject Equality Structuralist Subject Equality bound to TMV vocabulary O(n*log(n)) O(n*n) semantics as absolute value semantics as relative value Sowa’s Knowledge Signature • simpleSIM • yields very good results in restricted domains • usage of Topic is ignored bound to SM ontology bound to TMA ontology • SIM (bound to TMDM) • more generic, yields good results • usage of Topic is exploited • work to do bound to SMD ontology • adoption of Melniks Similarity Flooding Approach • not suitable for the usage scenario, but for SM ontology matching bound to TMRM Besides the TMDM Subject Equality Approach Subject Indication SMD1, Subject Equality SMDi( X Subject Indication SMD2, Subject Map Subject Proxy1, Subject Map Subject Proxy2) true | false X How can a SMDSIM be defined: How a deterministic Subject Indication Approach can be defined?