200 likes | 300 Vues
SemSearch introduces a semantic search engine that hides complexity from users by allowing for multiple keywords, precise results, and quick responses without linguistic processing. It involves making sense of user queries, translating them to formal queries, querying back-end data, and ranking results.
E N D
SemSearch: A Search Engine for the Semantic Web Yuangui Lei, Victoria Uren, Enrico Motta Knowledge Media Institute The Open University EKAW 2006 Presented by Jungyeon, Yang
Outline • Research background • SemSearch overview • Query interface • Search process • Implementation & examples • Conclusions
Research background • Semantic search: extending traditional search with the semantic web technology • Exploiting the explicit meaning of documents (i.e., ontology-based metadata) • Current semantic search tools • Form-based, e.g., SHOE, Magnet • QA-based, e.g., AquaLog, ORAKEL • Keyword-based, e.g., TAP, Squiggle, DOSE
Support for ordinary end users • Form-based tools • Forms are intuitive • Issues: knowledge overhead; scalability • QA-based tools • Easy to use • Issue: heavy NLP. • Keyword-based tools • Easy to post queries; quick response • Issue: typically one keyword only; general knowledge of the problem domain required
The goal of our search engine • Hide the complexity of semantic search from end users: • Low barrier to access: easy to post queries • Avoiding the form-based routine • Dealing with relatively complex queries • Supporting multiple keywords • Precise and self-explanatory results: • Results satisfy user queries • Results are easy to understand • Quick response • Avoiding linguistic processing
SemSearch Architecture End users Google-like User Interface Layer • Google-like query interface Text Search Layer • Semantic entity indexing engine • Semantic entity search engine Semantic Query Layer • Formal query construction engine • Query engine • Ranking engine Formal Query Language Layer (SPARQL, SERQL, etc.) Semantic Data Layer
The Google-like query interface • Extending the traditional keyword search languages by allowing the specification of: • The queried subject (the type of expected search results) • The combination of keywords • Three operations are used: • Operator “:” captures the query subject • “and”/”or” specifies the combination of keywords • Query formats: • One keyword: finding entities that have relations with the keyword match • Multiple keywords: “subject:keyword1 and/or keyword2 and/or keyword3”, e.g., “<news: phd students>”, <paper: john and enrico> • Advantages: • More flexible than form-based query interface • More powerful than state-of-art keyword-based semantic search interfaces
The search process • Step1: making sense of the user queries • Step2: translating user queries into formal queries • Step3: Querying the back-end semantic data repository • Step4: Ranking the querying results
Making sense of user queries • Finding out the semantic meaning of keywords • Class, (e.g., the keyword “phd students”) • Relation, (e.g., “author”) • Instance, (e.g., “Enrico”, ”KMi director”) • Method: text search • labels (rdfs:label) • Short literals also used in the case of instances matching • When searching for “KMi director”, the instances can be picked up. • Two components in the search engine • The semantic entity index engine • The semantic entity search engine
Translating user queries into formal queries • The search engine takes as input the semantic matches of user search terms • The search engine takes outputs an appropriate formal query according to the semantic meanings of keywords • One user query Each keyword multiple matches SEARCH ENGINE multiple formal queries.
Simple user queries • There are only two keywords involved: <subject : keyword> • Fixed number of combination types • The SeRQL query templates are defined
A template example • Pattern: Subject -> Class Cs; Keyword -> Class Ck • Results: <Is,Relation,Ik> associated with exploratory links. • Example: news stories about phd students • <news “KMi success”, mentions-person, Tom-Heath> • A simplified template in Sesame SeRQL: select {Is}, {R}, {Ik} from {Is} rdf:type {Cs}, {Ik} rdf:type {Ck}, {Is} R {Ik} union select {Is}, {R}, {Ik} from {Is} rdf:type {Cs}, {Ik} rdf:type {Ck}, {Ik} R {Is}
Complex user queries • < subject: keyword1 and/or keyword2 and/or… > • Instances of the subject which either have relations with all the keywords or have relations with some of the keywords. • Operational problem • the number of combination gets big when there are many keywords involved and there are lots of matches for each keyword. • Rules for combination reduction: • Only considering the subjectkeyword as class entities • Choosing the closest matches to the keyword as possible • Choosing the most specific class match among the class matches.
Query construction • In SeRQL • Three building blocks • Head block: what needs to be retrieved, i.e., <Is, r, Ikx> • Body block: how to retrieve the triples • Condition block: conditions need to be satisfied • Union block : in order to cover bidirectional relations SELECT DISTINCT label(ArtefactTitle), MuseumName FROM {Artefact} arts:created_by {} arts:first_name {"Rembrandt"}, {Artefact} arts:exhibited {} dc:title {MuseumName}, {Artefact} dc:title {ArtefactTitle} WHERE isLiteral(ArtefactTitle) AND lang(ArtefactTitle) = "en" AND label(ArtefactTitle) LIKE "*night*"
Has keyword match? Is instance? Is property? Is class? Query construction algorithm Initializing the query blocks No Yes Adding query blocks for class-class relations retrieval Yes No Adding query blocks for class-property relationsretrieval Yes No Adding blocks for class-instance relations retrieval Yes No Composing queries using the blocks
Conclusions • A keyword-based semantic search engine has been developed • Google-like query interface • Supporting relatively complex queries • Providing relatively quick response
Opinions • Pros • Google-like query interface (intuitive) • Supporting relatively complex queries • Cons • Limitation of the target data form. (RDF) • Ranking • Simple semantic matching • Issues • Finding out the semantic meaning of keyword • Storage modeling • Strategy of the semantic match between keyword and semantic entity