1 / 77

Schema Free Querying of Semantic Data

Schema Free Querying of Semantic Data. Lushan Han Advisor: Dr. Tim Finin May 23, 2014. Introduction Related Work SFQ Interface Schema Network and Association Models Query Interpretation Evaluation Conclusion. Road Map. Part 1. Introduction. Semantic Data.

kaiser
Télécharger la présentation

Schema Free Querying of Semantic Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Schema Free Querying of Semantic Data Lushan Han Advisor: Dr. Tim Finin May 23, 2014

  2. Introduction Related Work SFQ Interface Schema Network and Association Models Query Interpretation Evaluation Conclusion Road Map

  3. Part 1. Introduction

  4. Semantic Data • A network of entities, which are annotated with types and interlinked with properties. • Increasing amount of Semantic Data • Examples: • RDF semantic data • LOD • DBpedia • Freebase

  5. Objectives • Develop schema-free query interfaces • Works with “semantic data” in many forms, e.g., RDF, Freebase, RDBMS • Allow casual users to freely query semantic data without learning its schema • Queries should be in the user’s conceptual world • Two existing interfaces: • Natural Language Interface (NLI) • Keyword Interface • Three hard problems

  6. P1. No Practical Interface • Natural language interface • NLP techniques are still not reliable to parse out the full relational structure from natural language questions • Keyword interface • Ambiguity and limited expressiveness • (e.g. “president children spouse”) • (e.g. Who was the author of the Adventures of Tom Sawyer • and where was he born?)

  7. SFQ Interface • Still in the user’s conceptual world • Make implicit structure of NL questions explicit • Who was the author of the Adventures of Tom Sawyer • and where was he born?

  8. P2. Semantic Heterogeneity Problem • Many different ways to express (model) the same meaning • Vocabulary and structure mismatches between the user’s query and the machine’s representation • Existing methods: • Labor-intensive and ad-hoc methods • Domain-specific syntactic or semantic grammars • Mapping Lexicons (Mapping rules) • Templates • Thesaurus (e.g. WordNet) is insufficient

  9. P2. Examples

  10. P2. More Examples 5 4

  11. A purely computational approach • Lexical Semantic similarity Measures • Capture flexible semantics • Statistical Association Measures • Carry out disambiguation • A novel “overall semantic similarity” or fitness metric that combines • Lexical semantic similarity measures • statistical association measures • structure features • Context-sensitive mapping algorithms

  12. P3. Heterogeneous or unknown schema • Hard to reach consensus on a schema for the world • Open domain semantic data has heterogeneous or even unknown schema (e.g. Semantic Web data, DBpedia) • Traditional NLI systems are difficult to apply • Some modern systems • Not produce formal queries (e.g. SQL or SPARQL). • Directly search into the entity network for matchings • Computationally expensive and has ad-hoc natures

  13. The schema network • Learn a schema statistically from the entity network by exploiting co-occurrences. • The schema itself is also represented as a network • Mapping the user’s query into the schema network, instead of the entity network. • Much more scalable • Produce formal queries • Enable joint disambiguation and context-sensitive mapping algorithm

  14. Thesis Statement We can develop an effective and efficient algorithm to map a casual user's schema-free query into a formal knowledge base query language that overcomes vocabulary and structure mismatch problems by exploiting lexical semantic similarity measures, association degree measures and structural features.

  15. Contributions • An intuitive SFQ interface that avoids the problem of extracting relations structure from NL queries • Novel algorithms mapping SFQ queries to KB queries addressing both vocabulary and structure mismatches • A novel approach to handle heterogeneous or unknown schemas by building a schema from an entity network • Define the probability of observing a path in a schema network and develop two novel statistical association models • An improved PMI metric and new semantic text similarity measures and algorithms

  16. Part 2. Related Work

  17. Natural Language Interface to Database (NLIDB) Systems • Early Systems in 70s, (e.g. LUNAR and LADDER) • Domain-specific syntactic or semantic grammars • Heavily customized to a particular application • Later systems in 80s and 90s. (e.g. TEAM, ASK, MASQUE) • More general parser • Require human-crafted lexicons, mapping rules and domain knowledge to interpret the parse tree • Allow knowledge engineers or end users to enrich lexicons and add new mapping rules through an interactive interface • More portable than early systems

  18. Recent NLI Systems

  19. Part 3. SFQ Interface

  20. SFQ Examples • Where was the author of the Adventures of Tom Sawyer born? • Give me authors in the CIKM conference • A more complicated one

  21. Default Relations • The relation name can be left out • A stop word list for filtering relation names with words like in, of, has, from, belong, part of, locate and etc.

  22. Envisioned Web Interface

  23. Output (1)

  24. Output (2)

  25. Part 4. Schema Network and Association Models

  26. Instance Data (ABox) • Two datasets • The relation dataset (all relations between instances) • The type dataset (all type definitions for instances) • Integrate all RDF data types into five types that are familiar to users • ˆNumber, ˆDate, ˆYear, ˆText and ˆLiteral • ˆLiteral is the super type of the other four • We use DBpedia for examples in the following slides

  27. Automatically enrich the set of types Automatically deduce types from relations • Infer attribute types from data type properties • e.g. <Beijing>, population, “20693000” => ˆPopulation • Infer classes from object properties • e.g. < Zelig>, director, <Woody Allen> => ˜Director

  28. Counting Co-occurrence

  29. The Schema Network • A statistical meta description of the underlying entity network, which is a network itself.

  30. The Schema Path • A path on the schema network is called a schema path • A schema path P represents a composite relation Example 1. Example 2.

  31. The Schema Path Probability • Measure the reasonableness of a path • The probability of “observing” a path on the schema network • (A1) we select the starting node c0 of the path randomly from all the nodes in the schema network • (A2) observe the path in a random walk starting with c0

  32. Compute Transition Probability 0 ≤ ≤ 1

  33. A Property about Schema Path • A schema path P and its return path P’ represent the same relation. • Given a schema path Pand its return path P’ we have P(P) = P(P’). P P’

  34. Schema Path Model • Supposed to store and index all the schema paths with a length no larger than a given threshold and their probabilities • The only supported function is to return all the schema paths and their probabilities between two given classes. • Put in memory for fast computation

  35. Schema Path Model Optimization

  36. Concept Path • Group all the edges with the same direction between two nodes into a single edge • By analogy to schema path, we have concept path probability • Concept path frequency

  37. Concept Association Knowledge (CAK) model • Pairwise associations • (i) direct association between classes and properties • (ii) indirect association between two classes • PMI measure • Our improved PMI measure

  38. Concept Association Knowledge (CAK) model • Direct association between a directed class and a property p • Indirect association between two directed classes

  39. CAK Examples

  40. PMI* vs PMI The most associated property for “Person” in DBpedia PMI* PMI

  41. Part 5. Query Interpretation

  42. SFQ Interpretation

  43. Two Phase Mapping Algorithm

  44. Generating Candidates via Lexical Semantic Similarity

  45. Disambiguation via Optimization

  46. Concept Mapping Optimization Problem

  47. A joint disambiguation example

  48. Time Complexity of Concept Mapping Algorithm • A straightforward concept mapping algorithm • After exploiting locality – the optimal mapping choice of a property can be determined locally when the two classes it links are fixed

  49. Relation Mapping Optimization Problem • H* : the set of top k3 concept mapping hypotheses • The reduced mapping space for the SFQ • The optimization problem

  50. Computing the fitness of a mapping σon a relation r • Let • Two features and one parameter β • Joint lexical semantic similarity between and P • The schema path frequency of P • The parameter β adjusts the relative importance of the two features

More Related