Schema Free Querying of Semantic Data

Schema Free Querying of Semantic Data Lushan Han Advisor: Dr. Tim Finin May 23, 2014

Introduction Related Work SFQ Interface Schema Network and Association Models Query Interpretation Evaluation Conclusion Road Map

Part 1. Introduction

Semantic Data • A network of entities, which are annotated with types and interlinked with properties. • Increasing amount of Semantic Data • Examples: • RDF semantic data • LOD • DBpedia • Freebase

Objectives • Develop schema-free query interfaces • Works with “semantic data” in many forms, e.g., RDF, Freebase, RDBMS • Allow casual users to freely query semantic data without learning its schema • Queries should be in the user’s conceptual world • Two existing interfaces: • Natural Language Interface (NLI) • Keyword Interface • Three hard problems

P1. No Practical Interface • Natural language interface • NLP techniques are still not reliable to parse out the full relational structure from natural language questions • Keyword interface • Ambiguity and limited expressiveness • (e.g. “president children spouse”) • (e.g. Who was the author of the Adventures of Tom Sawyer • and where was he born?)

SFQ Interface • Still in the user’s conceptual world • Make implicit structure of NL questions explicit • Who was the author of the Adventures of Tom Sawyer • and where was he born?

P2. Semantic Heterogeneity Problem • Many different ways to express (model) the same meaning • Vocabulary and structure mismatches between the user’s query and the machine’s representation • Existing methods: • Labor-intensive and ad-hoc methods • Domain-specific syntactic or semantic grammars • Mapping Lexicons (Mapping rules) • Templates • Thesaurus (e.g. WordNet) is insufficient

P2. Examples

P2. More Examples 5 4

A purely computational approach • Lexical Semantic similarity Measures • Capture flexible semantics • Statistical Association Measures • Carry out disambiguation • A novel “overall semantic similarity” or fitness metric that combines • Lexical semantic similarity measures • statistical association measures • structure features • Context-sensitive mapping algorithms

P3. Heterogeneous or unknown schema • Hard to reach consensus on a schema for the world • Open domain semantic data has heterogeneous or even unknown schema (e.g. Semantic Web data, DBpedia) • Traditional NLI systems are difficult to apply • Some modern systems • Not produce formal queries (e.g. SQL or SPARQL). • Directly search into the entity network for matchings • Computationally expensive and has ad-hoc natures

The schema network • Learn a schema statistically from the entity network by exploiting co-occurrences. • The schema itself is also represented as a network • Mapping the user’s query into the schema network, instead of the entity network. • Much more scalable • Produce formal queries • Enable joint disambiguation and context-sensitive mapping algorithm

Thesis Statement We can develop an effective and efficient algorithm to map a casual user's schema-free query into a formal knowledge base query language that overcomes vocabulary and structure mismatch problems by exploiting lexical semantic similarity measures, association degree measures and structural features.

Contributions • An intuitive SFQ interface that avoids the problem of extracting relations structure from NL queries • Novel algorithms mapping SFQ queries to KB queries addressing both vocabulary and structure mismatches • A novel approach to handle heterogeneous or unknown schemas by building a schema from an entity network • Define the probability of observing a path in a schema network and develop two novel statistical association models • An improved PMI metric and new semantic text similarity measures and algorithms

Part 2. Related Work

Natural Language Interface to Database (NLIDB) Systems • Early Systems in 70s, (e.g. LUNAR and LADDER) • Domain-specific syntactic or semantic grammars • Heavily customized to a particular application • Later systems in 80s and 90s. (e.g. TEAM, ASK, MASQUE) • More general parser • Require human-crafted lexicons, mapping rules and domain knowledge to interpret the parse tree • Allow knowledge engineers or end users to enrich lexicons and add new mapping rules through an interactive interface • More portable than early systems

Recent NLI Systems

Part 3. SFQ Interface

SFQ Examples • Where was the author of the Adventures of Tom Sawyer born? • Give me authors in the CIKM conference • A more complicated one

Default Relations • The relation name can be left out • A stop word list for filtering relation names with words like in, of, has, from, belong, part of, locate and etc.

Envisioned Web Interface

Output (1)

Output (2)

Part 4. Schema Network and Association Models

Instance Data (ABox) • Two datasets • The relation dataset (all relations between instances) • The type dataset (all type definitions for instances) • Integrate all RDF data types into five types that are familiar to users • ˆNumber, ˆDate, ˆYear, ˆText and ˆLiteral • ˆLiteral is the super type of the other four • We use DBpedia for examples in the following slides

Automatically enrich the set of types Automatically deduce types from relations • Infer attribute types from data type properties • e.g. <Beijing>, population, “20693000” => ˆPopulation • Infer classes from object properties • e.g. < Zelig>, director, <Woody Allen> => ˜Director

Counting Co-occurrence

The Schema Network • A statistical meta description of the underlying entity network, which is a network itself.

The Schema Path • A path on the schema network is called a schema path • A schema path P represents a composite relation Example 1. Example 2.

The Schema Path Probability • Measure the reasonableness of a path • The probability of “observing” a path on the schema network • (A1) we select the starting node c0 of the path randomly from all the nodes in the schema network • (A2) observe the path in a random walk starting with c0

Compute Transition Probability 0 ≤ ≤ 1

A Property about Schema Path • A schema path P and its return path P’ represent the same relation. • Given a schema path Pand its return path P’ we have P(P) = P(P’). P P’

Schema Path Model • Supposed to store and index all the schema paths with a length no larger than a given threshold and their probabilities • The only supported function is to return all the schema paths and their probabilities between two given classes. • Put in memory for fast computation

Schema Path Model Optimization

Concept Path • Group all the edges with the same direction between two nodes into a single edge • By analogy to schema path, we have concept path probability • Concept path frequency

Concept Association Knowledge (CAK) model • Pairwise associations • (i) direct association between classes and properties • (ii) indirect association between two classes • PMI measure • Our improved PMI measure

Concept Association Knowledge (CAK) model • Direct association between a directed class and a property p • Indirect association between two directed classes

CAK Examples

PMI* vs PMI The most associated property for “Person” in DBpedia PMI* PMI

Part 5. Query Interpretation

SFQ Interpretation

Two Phase Mapping Algorithm

Generating Candidates via Lexical Semantic Similarity

Disambiguation via Optimization

Concept Mapping Optimization Problem

A joint disambiguation example

Time Complexity of Concept Mapping Algorithm • A straightforward concept mapping algorithm • After exploiting locality – the optimal mapping choice of a property can be determined locally when the two classes it links are fixed

Relation Mapping Optimization Problem • H* : the set of top k3 concept mapping hypotheses • The reduced mapping space for the SFQ • The optimization problem

Computing the fitness of a mapping σon a relation r • Let • Two features and one parameter β • Joint lexical semantic similarity between and P • The schema path frequency of P • The parameter β adjusts the relative importance of the two features

Schema Free Querying of Semantic Data

Schema Free Querying of Semantic Data

Presentation Transcript

Querying the Semantic Web with RQL *

 -Queries: Enabling Querying for Semantic Associations on the Semantic Web

Semantic Basics: Markup, Querying, and Reasoning

Querying Encrypted Data

Lab #3 Querying Data

Lab #3 Querying Data

Building Semantic Sensor Webs and Applications Querying Streaming Data through Ontologies

Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf.

No-Schema SQL Querying Relational Databases Independent of Schema

Natural Language Querying of the Semantic Web

Data analysis by querying

Querying your data

ABCD Data Schema

Chapter 3 Querying the Semantic Web

Schema-Free XQuery

Semantic Access: Semantic Interface for Querying Databases

Data Querying Website

Querying Encrypted Data

Data Querying Website

SCHEMA-BASED SEMANTIC MATCHING

Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf.