1 / 54

LAMI Spring 2014

LAMI Spring 2014. Search Engine and Services. Presented by Edgar Cornejo 03.03.14. Outline. Mobile information search for location-based information Web-a-Where: Geotagging Web Content

joy
Télécharger la présentation

LAMI Spring 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LAMISpring 2014 Search Engine and Services Presented by Edgar Cornejo 03.03.14

  2. Outline Mobile information search for location-based information Web-a-Where: Geotagging Web Content The design and implementation of SPIRIT: a spatially-aware search engine for information retrieval on the Internet

  3. Mobile information search for location-based information Department of Industrial Engineering Tsinghua University Beijing, China April 2010 ChengyiLiu · Pei-Luen Patrick Rau · FeiGao

  4. Mobile search for location-based information The study investigated the effects of location and information type in mobile searching for location-based information by carrying out two experiments in an airport Mobile information search for location-based information

  5. Mobile search scenario Many environmental disturbances High time pressure Restricted users’ operations Device limitations (screen size, input method) Mobile information search for location-based information

  6. Mobile searching context • Search Engine Since most of the information is location-based [1,2], the results can be improved by analyzing information queries and location Information queries + location More suitable results Mobile information search for location-based information

  7. Features of mobile interaction [3] Users may be involved in tasks that demand a high level of visual attention User's hands are often used to manipulatephysical objects Mobile information search for location-based information

  8. Features of mobile interaction [3] Users may be highly mobile during the task and have high-speed interaction Mobile information search for location-based information

  9. Search queries *According to a large scale study of European mobile search behavior developed in 2008 [4] Mobile information search for location-based information

  10. Factors proposed that may influence the mobile information search

  11. Experiment 1 - Hypotheses • Hypothesis 1 • For information searches in mobileversus non-mobile: • The average of clicks in mobile is less • The first search is more important • Free recall is worse Mobile information search for location-based information

  12. Experiment 1 - Hypotheses • Hypothesis 2 • For information searching about location-based with respect to non-location-based information • The number of clicks is less • The first search result is more important • Free recall is better Mobile information search for location-based information

  13. Experiment 1 - Tasks

  14. Experiment 1 - Results Mobile information search for location-based information

  15. Experiment 1 - Results Mobile information search for location-based information

  16. Experiment 2 - Hypotheses • Hypothesis 3 • For mobile information searching under high pressurewith respect to low pressure info requirement: • Average number of clicks are less • The first search result is more important • Free recall is worse Mobile information search for location-based information

  17. Experiment 2 - Hypotheses • Hypothesis 4 • For mobile information searching of informational or navigational with respect to transactionalqueries • Number of clicks is greater • The first search result is less important • Free recall is worse Mobile information search for location-based information

  18. Experiment 2 - Tasks

  19. Experiment 2 - Result Mobile information search for location-based information

  20. Experiment 2 - Results Mobile information search for location-based information

  21. Summary • Information type (location-based vs. non-location-based) was found to be effective in user performance during the information search process • Information requirement pressure and location-based information type (navigational, informational and transactional) affect the mobile search process • The first two search results were found to be very important to good search efficiency and good user satisfaction Mobile information search for location-based information

  22. Web-a-Where: Geotagging Web Content EinatAmitay · NadavHar’El Ron · SivanAyaSoffer IBM Haifa Research Lab Haifa 31905, Israel July 2004

  23. Web-a-Where: Geotagging Web Content • Is a system for associating geography with Web pages • Locates mentions of places and determines the place each name refers to • Assigns to each page a geographic focus a locality that the page discusses as a whole • Implemented within the framework of the IBM WebFountaindata mining system Web-a-Where: Geotagging Web Content

  24. Web-a-Where: Geotagging Web Content • Pages may have two types of geography associated with it: a source and a target. • Source geography has to do with the origin of the page, the physical location, address of its author, etc. • Target geography is determined by the contents of the page and relates to the topic the page is discussing. Web-a-Where: Geotagging Web Content

  25. Ambiguities • Geo/non-geoambiguity is the case of a place name having another, non geographic meaning e.g. Mobile (Alabama) or Reading (England) • Geo/geo ambiguity arises when two or more distinct places have the same name Web-a-Where: Geotagging Web Content

  26. System Components • Geotagger(Main component) • Finds and disambiguates geographic names • Assigns a taxonomy node to each phrase in the text to refer to a place e.g., Paris/France/Europe • The gazetteer • Database that keeps the list of geographic names, their canonical taxonomies and other information Web-a-Where: Geotagging Web Content

  27. Tagging individual place names The processing of a page is done in three phases: Spotting Focus determination Disambiguation Web-a-Where: Geotagging Web Content

  28. 1. Spotting place name candidates • Finding all the possible geographic names in each page • Short abbreviations are not spotted e.g. IN (for Indiana) or AT ( for Austria) but used to help disambiguate other spots e.g. Gary, IN Web-a-Where: Geotagging Web Content

  29. 2. Disambiguating spots (Algorithm) • The geotaggerassigns a unique meaning to spots that can be uniquely qualified. Confidence 95% • Combinations that are not unique are left unassigned • In a page with multiple spots with the same name where only one is qualified, this value is assigned to the others. Confidence 80% • Disambiguation contexts are also used to unassigned spots with confidence less than 70% Web-a-Where: Geotagging Web Content

  30. 2. Disambiguating spot (Data sources) • The Geographic Names Information System (GNIS) for U.S. locations • world-gazetteer.com for non-U.S. locations • United Nations Statistic Division (UNSD) for countries and continents • ISO 3166-1 for country and other abbreviations Web-a-Where: Geotagging Web Content

  31. 3. Focus determination • The basic idea is that if several cities from the same region are mentioned, probably this region is the focus • Sometimes cannot be said that a page has only one focus • The confidence score should be taken into account when finding the focus, giving higher weight to information coming from locations with higher confidence Web-a-Where: Geotagging Web Content

  32. Example A certain page contained four mentions of Orlando/Florida(assigned confidence 0.5), three Texas (0.75), eight Fort Worth/Texas (0.75), three Dallas/Texas (0.75), one Garland/Texas (0.75), and one Iraq (0.5) A human was asked to judge what is the geographical focus of this page and responded with “It’s about Texas and perhaps also Orlando” Indeed, that page comes from the “Orlando Weekly” site, in a forum titled “Just a look at The Texas Local Music Scene...” Web-a-Where: Geotagging Web Content

  33. Evaluating geotagging precision Geotags assigned automatically versus defined manually Web-a-Where: Geotagging Web Content

  34. Evaluating focus Comparison of Web-a-Where-determined focus to human-determined one (ODP) for ~1 million pages Web-a-Where: Geotagging Web Content

  35. Summary • The system is able to correctly tag individual name place occurrences 80% of the time and define correct focusof a page 92% of the time • Accuracy can be further improved • The main source of errors is geo/non-geo ambiguity Web-a-Where: Geotagging Web Content

  36. The design and implementation of SPIRIT Ross Purves, Paul Clough, Christopher Jones, AviArampatzis, BenedicteBucheri, David Finch, Gaihua Fu, Hideo Joho, AwaseHhirni Syed, SubodhVaidand Bisheng Yang Department of Geography, University of Zurich, Switzerland Department of Information Studies, University of Sheffield, UK School of Computer Science, Cardiff University, UK Institute of Information and Computing Sciences, Utrecht University, Netherlands Laboratoire COGIT - Institut GeographiqueNational, France August 2007

  37. The design and implementation of SPIRIT This paper describes the design and implementation of a complete solution to geographic information retrieval The design and implementation of SPIRIT

  38. Requirements • Exhaustive retrieval of relevant documents in a specified area • Place names should be automatically identified, and interactively disambiguated • Ability to query for geographical areas whose boundaries are imprecise The design and implementation of SPIRIT

  39. Requirements • Spatial concepts relating different geographic entities should be represented (outside, in) • It should be possible for users to specify the area of interest on a map • Ability to view query results on a map linked to relevant web documents • Document ranking should combine both spatial and thematic aspects of document relevance The design and implementation of SPIRIT

  40. Architecture Overview • Search Engine Geographical ontology Query disambiguation Query expansion Geo-coding Metadata Doc-to-footprint mapping Rank results User interface Broker Relevance ranking Geo-parsing Search request Spatial index Web data collectiondocuments Indexes Textual Spatial Access indexes Textual index Pre-processing Run-time The design and implementation of SPIRIT

  41. Functionality of the components Pre-processing the document collection Assigning spatial footprints to web documents: Identifygeographicalreferences (geoparsing) Assign them to spatial coordinates (geocoding) Spatialfootprint The design and implementation of SPIRIT

  42. Functionality of the components Building document indexes • Grid-based spatial indexing • For each cell of the grid, a list of document ID’s was constructed, using the document footprints which resulted from the geo-tagging process The design and implementation of SPIRIT

  43. Functionality of the components Retrieving the results: “T” (Text) Scheme • Simplest approach • Retrieve all the documents that match the concept terms of the query and then filter to return only those which intersect the geographical scope of the place in the query (footprint) The design and implementation of SPIRIT

  44. Functionality of the components Retrieving the results: “ST” (Space-Text) Scheme • More integrated approach • Regarded as a space-primary method • At search time the cells that intersect the query footprint are determined and then only the corresponding text indexes are searched The design and implementation of SPIRIT

  45. Functionality of the components Retrieving the results: “TS” (Text-Space) Scheme • Better query response time • Regarded as a text-primary method • At search time, for each term, the associated documents are grouped according to the spatial index which they relate to The design and implementation of SPIRIT

  46. Query interfaces The design and implementation of SPIRIT

  47. Results display The design and implementation of SPIRIT

  48. Evaluation Performance analysis A relevant document to the query had to be both thematically and spatially relevant. In this sense, the key result of the work is that spatially aware search outperformed text-only search. The design and implementation of SPIRIT

  49. Evaluation Usability analysis The design and implementation of SPIRIT

  50. Conclusions • The paper describes a unified approach, as well as the architecture, for introducing spatial-awareness into search-engine technology • A prototype system demonstrated the effectiveness of the strategy The design and implementation of SPIRIT

More Related