1 / 47

Location-based search: services, photos, web

Location-based search: services, photos, web. Andrei Tabarcea Mohammad Rezaei 4.12.2013. Introduction. keyword. The goal is to find services , photos and points of interest close to the user’s location We call this “ location-based search ”

takoda
Télécharger la présentation

Location-based search: services, photos, web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Location-based search: services, photos, web Andrei Tabarcea Mohammad Rezaei 4.12.2013

  2. Introduction keyword The goal is to find services, photosand points of interest close to the user’s location We call this “location-based search” We try to search our local database of photos and services and to find location information in web-pages Results on map Userlocation

  3. MOPSI search Mopsi search Mopsi Services Database User location keyword Combination of search results Mopsi Photo Collection Mopsi Web Search

  4. Web interface Input: (keyword, user location) Output: array of results keyword Results on map Searchoptions Resultslist Userlocation

  5. Mobile interface Userlocation Searchresults

  6. Mopsi search (server workflow) Input: (keyword, flagS, flagP, flagW, user location) Output: (g_markersData) array of results keyword Searchoptions Results on map flagS flagP flagW Userlocation Resultslist

  7. Location based search Input: keyword, flagS, flagP, user location (lat,lon) Output: list of results Note: A service has a list of keywords and a title A photo has just a description So, Keyword search is done according to this information Notation: S: service, P: photo text(S): keywords and title of service text(P): description of photo flagS: search for services if true flagP: search for photos if true

  8. Start Overall flow Update keywords statistics Update keywords history N flagS Stage 1: Search mopsi services Y Local service search and display results in list N flagP When a keyword is searched: statistics: the count of it in database is incremented, keyword and city arestored history: keyword, location, userid and time are stored Stage 2: Search mopsi photos Y Photo search and add results to the list Display all results on Map N flagW Stage 3: Search web Y Web search Add results to the list and on map End

  9. Local service search Start Do search on server nL=number of results The list of results nL>0 N Take and display one of the similar results as representative Y Cluster results with almost same title and location Sort the results (distance to user location) Display results in the list End

  10. Start Photo search Do search on server nP=number of results nP>0 N Y Cluster results with almost same title and location Cluster the results and Local services with almost same title and location Sort the results (distance to user location) Add results to the list End

  11. Start Web search Do search on server nW=number of results nW>0 N Y Cluster results with almost same title and location Cluster the results and Local services and photos with almost same title and location Sort the results (distance to user location) Add results to the list Add results on the map End

  12. Filtering results: old solution Fixed distance to user location: d Find services where text(S) ≈ keyword AND dist(S,User) < d Find photos where text(P) ≈ keyword AND dist(P,User) < d d Advantages: Simple Same time for any search Disdvantages: Parameter d (User can choose d, but still not automatic) There are many cases with “no results”

  13. Current solution: Binary search K-nearest services • Show all the results in 10 km • If number of results is less than K, double the distance (until whole earth), when number of results is bigger than K, divide the distance Example with k=5: Number of results n in distance d: 1 < k Double distance: in 2d, n=2 < k In 4d, n=8 > k Now dividing distance in colored area: In 3d, n=4 < k In 3.5d, n=5 (=k) So, we have 5 nearest results to user location in distance x x 4d d 2d User location A photo or service with required keyword

  14. Algorithm d=10000: initial distance K=10: number of required results delta_dist: minimum distance for dividing ns: number of resulted services res_S np: number of resulted photos res_P res_S = services where text(S) ≈ keyword res_P = photos where text(P) ≈ keyword if ( ns+np > K ) (res_Sres_P dist) = extend_distance(); (res_Sres_P dist) = contract_distance(); display (ns+np) services and photos extend_distance() ns= 0; np=0; While ( ns+np < K AND dist < earth_r*pi) res_S = services where text(S) ≈ keyword AND dist(S,User) < dist res_P = photos where text(P) ≈ keyword AND dist(P,User) < dist dist = dist*2 dist = dist/2 Δ 4d d 2d

  15. Algorithm (cont.) contract_distance(dist, K) d1 = dist/2 d2 = dist dist = (d1 + d2)/2 delta = dist – d1 ns=np=0 While ( ns+np != K AND delta > delta_distAND dist > d ) res_S = services where text(S) ≈ keyword AND dist(S,User) < dist res_P = photos where text(P) ≈ keyword AND dist(P,User) < dist if ( ns+np > K ) d1 = d1; d2= dist else d1 = dist; d2 = d2 dist = (d1 + d2)/2 delta = dist-d1

  16. Simplifyingdistancecalculation Since there is no spatial dist function in mysql: Points with distance < d from user location Simplified: |lat-lat1|< ΔlatAND |lon-lon1|< Δlon (lat1+ Δlat, lon1) d d (lat1, lon1+ Δlon) lat1, lon1 d (in meter) Δlat and Δlon? User location (lat1, lon1)

  17. Δlat and Δlon? Distance d (in meter) between two points (lat1, lon1) and (lat2, lon2): Haversine distance: Earth diameter (in meter) (lat1, lon1) and (lat1, lon1+ Δlon)  Δlat=0 (lat1, lon1) and (lat1+ Δlat, lon1)  Δlon=0

  18. Location-based web data mining How to find location-information in web-pages?

  19. Mopsi web search Web mining

  20. Geo-referencing Geo-referencing: A geographic reference is an information entity that is discovered from the context and can be mapped to a geographic location Strategies for geographic reference extraction: • Gazetteer-based text matching • Rule-based linguistic analysis • Regular-expression based text matching • Using host location • Geographic meta-tags Hu, Y. H., Lim, S., & Rizos, C. Georeferencing of Web Pages based on Context-Aware Conceptual Relationship Analysis. 2006

  21. Ad-Hoc Georeferencing <HTML> <HEAD profile"="http://geotags.com/geo> <METAname="geo.position" content="62.35;29.44"> <METAname="geo.region" content="FI"> <METAname="geo.placename" content="Joensuu"> <METAhttp-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <linkrel="stylesheet" href="http://www.joensuu.fi/tkt/sivutyyli.css" type="text/css"> <TITLE>Pages of PasiFränti</TITLE> </HEAD> The problem is how to extract and validate location data from semi-structured text Postal address is the most common location data found Our goal is to give geographical coordinates to services mentioned in web-pages We call this method ad-hoc georeferencing VS.

  22. Location Information in Webpages Site hosting information (owner address, server address etc.) HTML tags (geo-tags, address-tags, vcards for Google Maps etc.) Natural language descriptions Addresses, postal codes, phone numbers

  23. Site hosting information domain:   uef.fidescr:    ITÄ-SUOMEN YLIOPISTO (UNIV OF EASTERN FINLAND)descr:    22857339address:  TIETOTEKNIIKKAKESKUS (IT-CENTRE)/JarnoHuuskonenaddress:  PL 1627address:  70211address:  KUOPIO FINLANDphone:    +358 44 7162810status:   Grantedcreated:  26.5.2010modified: 19.8.2011expires:  26.5.2015nserver:  ns-secondary.funet.fi [Ok]nserver:  ns1.uef.fi [Ok]nserver:  ns2.uef.fi [Ok]dnssec:   no

  24. HTML tags geo-tags, address-tags, vcards for Google Maps etc. <HTML> <HEAD profile"="http://geotags.com/geo> <METAname="geo.position" content="62.35;29.44"> <METAname="geo.region" content="FI"> <METAname="geo.placename" content="Joensuu"> <METAhttp-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <linkrel="stylesheet" href="http://www.joensuu.fi/tkt/sivutyyli.css" type="text/css"> <TITLE>Pages of PasiFränti</TITLE> </HEAD>

  25. Natural language descriptions

  26. Postal addresses

  27. Mopsi search Input: • user location (lat, lon) • keywords Output: list of services containing: • name/title • website • address (street, number. city) • location (lat, lon) • image • other info (opening hours, telephone etc.) Main idea: • preprocess the search results of an external search engine (Google, Yahoo, Bing etc.) by detecting postal address in order to find the location

  28. Problems • How to evaluate relevance? • Mixed keyword meanings • No relation between keywords and addresses

  29. Mopsi Web Search Workflow Geo-referencing module Keyword Coordinates Mobile application Geocoded street-name database Coordinates Search results Address Keyword Coordinates Web user interface Search results

  30. Georeferencing module Geocoded database Coordinates Coordinates Municipalities list Addresses Georeferencing module Relevant municipalities detector Page parser Address and description detector Address validator Sorted results list Word list Keyword Municipalities Results list Keyword, Address, Coordinates <keyword, municipality> query Result links

  31. Proposed steps • Convert user location (lat, lon) into user address = Geocoding step • Search with the query "keyword+city" using an external search engine API and download the first k results (web pages) = Web page retrievalstep • Detect addresses and additional informatio from the downloaded web pages = Data miningstep • Ranking the results (distance, relevance etc.) = Ranking step • Display the search results to the user lat, lon 1. Geocoder 3. Data mining 4. Result ranking 2. Web page retrieval web pages result list keywords User 5. ranked result list

  32. 1. Geocoding lat, lon Geocoder Convert user location (lat, lon) into user address using: Web page retrieval web pages result list Data mining Result ranking keywords User ranked result list

  33. 2.Web page retrieval lat, lon Geocoder • Download k webpages from the query <keyword, city> using API of: Web page retrieval web pages result list Data mining Result ranking keywords User ranked result list

  34. 3.Data mining Main idea: Find location information in HTML pages by detecting postal addresses Steps: • Parse and segment the HTML page • Identify addresses and locations • Identify the services the addresses are pointing to (name/title) and retrieve extra information (photos, opening hours, telephone etc.) lat, lon Geocoder Web page retrieval web pages result list Data mining Result ranking keywords User ranked result list

  35. 3.1 Parsing HTML pages • Current solution extracts an array of text from HTML pages • We don’t exploit the advantage that we extract data from web pages • Proposed future solution: • Segmentation of web pages using DOM trees • Detection of the address block • Nearest-neighbor search considering text and visual characteristics Joen Pizza Special Y-tunnus 2129577-6 Käyntiosoite Koskikatu 17 80100 JOENSUU Postiosoite Koskikatu 17 80100 JOENSUU Puhelin: 013-220246 Virallinen toimiala Kahvila-ravintolat

  36. Web page example - Homepage

  37. DOM tree blue: links (the A tag)red: tables (TABLE, TR and TD tags)green: dividers (DIV tag)violet: images (the IMG tag)yellow: forms (FORM, INPUT, TEXTAREA, SELECT and OPTION tags)orange: linebreaks and blockquotes (BR, P, and BLOCKQUOTE tags)black: HTML tag, the root nodegray: all other tags 

  38. DOM subtree <body> <tr> <div> <tr> <td> PizzaPojatNiinivaara <table> <html> <td> <table> <div> Niinivaarantie 19 <table align="center“> <tr> <td> <div id="footerleft"> <h3>PizzaPojatNiinivaara</h3> <p>Niinivaarantie 19</p> <p>80200 Joensuu</p> <br /> <p>013 - 137 017</p> </div> <td> </tr> </table> 013 - 137 017 80200 Joensuu <br/>

  39. Web page example - Catalog Miami Bosbor kebab Fiesta

  40. Proposed implementation • Convert HTML pages to xHTML for using xQuery • Detect addresses and postal codes • Break the DOM tree into subtrees • Use heuristics and regular expressions to detect extra information from the subtree (service name, telephone, opening hours etc.) <body> <tr> <div> <tr> <td> PizzaPojatNiinivaara <table> <html> <td> <table> Niinivaarantie 19 013 - 137 017 80200 Joensuu <br/>

  41. 3.2 Postal address detection Rule-based pattern matching algorithm Starting point: the detection of street-names Prefix trees are used for fast text matching for street-names An address-block candidate is constructed by detecting: • street names and number • postal codes • municipal names We will use OpenStreetMap database for global detection Street names City names Street numbers Telephone numbers

  42. 3.2 Postal address detection streetName postcode number city AddressDetection(words) i=0 while i < count(words) set street, number, postcode, city as empty if word[i] is streetName i++ street = words[i] for j = i to i+5 if words[j] is number number = words[j] break for k = j+1 to j+5 if word[k] is postcode postcode = words[k] j = k break for k = j+1 to j+5 if words[k] is city city= words[k] i = k+1 break if street is not empty AND number is not empty AND city is not empty candidate = (street, number, postcode, city) Joen Pizza Special Y-tunnus: 2129577-6 Käyntiosoite: Koskikatu1780100JOENSUU Puhelin: 013-220246 Virallinen toimiala: Kahvila-ravintolat

  43. Prefix Trees Invented by Friedkin (1960) The prefix tree (or trie) is a fast ordered tree data structure used for retrieval Root is associated with an empty string All the descendants of a node have a common prefix of the string associated with that node Some nodes can have associated values (usually they mark the end of a word)

  44. Street-name prefix trees Our solution is to detect street-names using prefix trees constructed from the gazetteer A street-name prefix tree is build for each municipality used in the search The user’s location and his area of interested are known, therefore prefix-trees can be limited to municipalities

  45. 3.3 Retrieve extra information • Title detection (or company detection) is a Named Entity Recognition problem Joen Pizza Special Y-tunnus: 2129577-6 Käyntiosoite: Koskikatu 17 80100 JOENSUU PostiosoiteKoskikatu 17 80100 JOENSUU Puhelin: 013-220246 Virallinen toimiala: Kahvila-ravintolat wordsbeforetheaddress address • Usually, the text before the address holds relevant information • There are other methods to investigate such as using classifiers or using web page structure

  46. 4. Ranking • Main criterion: • distance from the user’s location • Future idea: • relevance to user’s profile and history lat, lon Geocoder Web page retrieval web pages result list Data mining Result ranking keywords User ranked result list

  47. Future ideas recap • Use freely available geographical sources for extending the prototype to other regions • Use geographical scope of a web page to improve address detection and disambiguation • Use the structure of the HTML page and DOM tree semantic analysis for better data extraction • Gather and tag a testing dataset for better evaluation of the algorithms

More Related