1 / 30

Improving Semantic Search Using Query Log Analysis

Improving Semantic Search Using Query Log Analysis. Khadija Elbedweihy, Stuart N . Wrigley and Fabio Ciravegna OAK Research Group, Department of Computer Science, University of Sheffield, UK. Outline. Introduction Semantic Query Logs Analysis - Query-Concepts Model

iokina
Télécharger la présentation

Improving Semantic Search Using Query Log Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Semantic Search UsingQuery Log Analysis Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna OAK Research Group, Department of Computer Science, University of Sheffield, UK

  2. Outline Introduction Semantic Query Logs Analysis - Query-Concepts Model - Concepts-Predicates Model - Instance-Types Model Results Augmentation Data Visualisation

  3. Introduction

  4. Motivation See our paper from this morning’s IWEST 2012 workshop • Little work on results returned (answers) and presentation style. • Users want direct answers augmented with more information for richer experience1 • Users want more user-friendly and attractive results presentation format1 • Semantic query logs: logs of queries issued to repositories containing RDF data.

  5. Related Work Semantic query logs analysis: Moller et al. identified patterns of Linked Data usage with respect to different types of agents. Arias et al. analysed the structure of the SPARQL queries to identify most frequent language elements. Luczak-Rösch et al. analysed query logs to detect errors and weaknesses in LD ontologies and support their maintenance.

  6. Related Work (cont’d) How our work is different: Analyze semantic query logs to produce models capturing different patterns of information needs on Linked Data: • Concepts used together in a query: query-concepts model • Predicate used with a concept: concept-predicates model • Concepts used as types of a LD entity: instance-types model The models make use of the “collaborative knowledge” inherent in the logs to enhance the search process.

  7. Semantic query log analysis

  8. Extraction Extract SPARQL query Query logs entries follow the Combined Log Format (CLF): SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }

  9. Analysis query type http://dbpedia.org/resource/Ringo_Starr type http://dbpedia.org/ontology/MusicalArtist SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } For each bound resource (subject or object) -> query endpoint for the type of the resource

  10. Query-Concepts Model SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel<…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } 1) Retrieve types of resources in the query: Ringo_Starrtype dbpedia-owl:MusicalArtist, umbel:MusicalPerformer The_Beatlestype dbpedia-owl:Band, schema:MusicGroup 2) Increment the co-occurrence of each concept in the first list with each concept in the second: MusicalArtist Band MusicalPerformer MusicGroup MusicalArtist MusicGroupMusicalPerformer Band

  11. Concept-Predicates Model SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel<…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:genre?genre. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument?instrument. } 1) Retrieve types of resources used as subjects in the query: Ringo_Starrtype dbpedia-owl:MusicalArtist, umbel:MusicalPerformer 2) Identifyboundpredicates (dbpedia:genre, dbpedia:instrument) 3) Increment the co-occurrence of each type with the predicate used in the same triple pattern: MusicalPerformer genreMusicalPerformer instrument MusicalArtist genre MusicalArtist instrument

  12. Instance-Types Model SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel<…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } 1) Retrieve types of resources in the query: Ringo_Starrtype dbpedia-owl:MusicalArtist, umbel:MusicalPerformer The_Beatlestype dbpedia-owl:Band, schema:MusicGroup 2) Increment the co-occurrence of concepts found as types for the same instance: MusicalArtist MusicalPerformer Band MusicGroup

  13. Result Augmentation

  14. Dataset Two sets of DBpedia query logs made available at the USEWOD2011 and USEWOD2012 workshops. The logs contained around 5 million queries issued to DBpedia over a time period spanning almost 2 years

  15. Results Enhancement 2. See our paper from this morning’s IWEST 2012 workshop Google, Yahoo!, Bing, etc. enhance searchresults using structureddata FalconS and VisiNav return extra information together with each entity in the answers (e.g. type, label) Evaluation of Semantic Search showed that augmenting answers with extra information provides a richer user experience2.

  16. FalconS Results Query: `population of New York city’ Information chosen depend on manually (randomly) predefined set.

  17. Motivation for proposed approach 3. Luczak-Rösch et al. ; Elbedweihy et al. Utilizing query logs as a source of collaborative knowledge able to capture implicit associations between Linked Data entities and properties. Use this to select which information to show the user. Two recent studies3 analyzed semantic query logs and observed that a class of entities is usually queried with similar relations and concepts.

  18. Two Related Types of Result Augmentation • Additional result-related information. • More details about each result item • Provides better understanding of the answer. • Additional query-related information. • More results related to the query entities • Assists users in discovering useful findings (serendipity)

  19. Return additional result-relatedinformation Steps • For each result item, find types of instance. • Most frequently queried predicates associated with them are extracted from the concept-predicates model. • Generate queries with each pair (instance, predicate). e.g. (<…dbpedia.org…/Ringo_Starr> , genre) • Show aggregated results to the user.

  20. Return additional result-related information MusicalArtist-> genre, associatedBand, occupation, instrument, birthDate, birthPlace, hometown, prop:yearsActive, foaf:surname, prop:associatedActs, … Query: “Who played drums for the Beatles?” Result: Ringo Starr ➔ Pop music, Rock music (genre) ➔ Keyboard, Drum, Acousticguitar(instrument) ➔ The Beatles, Plastic Ono Band, Rory Storm,(assoc.Band)

  21. Return additional query-related information Steps Extract all concepts from query. For any instances, find their types. For each query concept, find most frequently occurring concepts from the query-concepts model. For each related concept, query for instances that have relation with the originating instance. Show aggregated results to the user.

  22. Return additional query-related information City-> Book, Person, Country, Organisation, SportsTeam, MusicGroup, Film, RadioStation, River, University, SoccerPlayer, Hospital, ... Query: “Where is the University of Sheffield located?” Result: Sheffield, UK ➔ Nick Clegg, Clive Betts, David Blunkett (Person) ➔ Sheffield United F.C., Sheffield Wednesday (SportsTeam) ➔ Hallam FM, Real Radio, BBC Radio Sheffield(RadioStn.) ➔ Jessop Hosp., Northern General, Royal Hallamshire(Hospital) ➔ Uni.of Sheffield, Sheffield Hallam Uni. (University)

  23. Visualisation

  24. Data Visualization View-based interfaces (e.g. Semantic Crystal and Smeagol) support users in query formulation by showing the underlying data and connections. Helpful for users, especially those unfamiliar with the search domain. Try to bridge the gap between user terms and tool terms (habitability problem) Facing challenge to visualize large datasets without cluttering the view and affecting user experience.

  25. Data Visualization: Proposed approach Visualizing large datasets (especially heterogeneous ones) is a challenge. To overcome this, we need to select and visualize specific parts of the data. Exploit collaborative knowledge in query logs to derive selection of concepts and predicates added to user’s subgraph of interest.

  26. Data Visualization: Proposed approach Steps • User enters NL query • Return best-attempt results • Identify query instances and find their types • For each type: • Extract most queried predicates associated with it from concept-predicates model. • Extract most queried concepts associated with it from query-concepts model. • Add these to the user’s query graph (see next slide)

  27. Example Best-attempt results Result-Related information ➔ depiction: Query: “What is the capital of Egypt?” Answer: Cairo ➔ latitude: 30.058056 ➔ longitude: 31.228889 ➔ population: 6758581 ➔ area: 453000000 ➔ time zone: Eastern European Time ➔ subdivision: Governorates of Egypt ➔ page: http://www.cairo.gov.eg/default.aspx ➔ nickname: The City of a Thousand Minarets, Capital of the Arab World

  28. Example Query-Related information Query: “What is the capital of Egypt?” Answer: Cairo ➔ Cairo Uni., Ain Shams Uni., German Uni., British Uni. (University) ➔ IttihadEl Shorta, El Shams Club, AlNasr Egypt (SportsTeam) ➔ Orascom Telecom, HSBC Bank, EgyptAir, Olympic Grp (Organisation) ➔ Nile River (River) ➔ Al Azhar Park (Park) ➔ Hani Shaker, Sherine, Umm Kulthum, Am Diab (MusicalArtist) ➔ Nile TV, AL Nile, Al-Baghdadia TV (BroadCaster) ➔ Egyptian Museum, Museum of Islamic Art (Museum)

  29. Data Visualization: Proposed approach Most queried predicates with “Country” Most queried concepts with “Country” Query instance Step 5: Add concepts and predicates to user’s query graph

  30. Questions Thank You Questions?

More Related