1 / 100

From Syntactic Search to Semantic Search

From Syntactic Search to Semantic Search. מחיפוש תחבירי לחיפוש משמעותי. אריאל פרנק מחלקה למדעי המחשב אוניברסיטת בר-אילן ariel@cs.biu.ac.il. Contents. Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!.

chandler
Télécharger la présentation

From Syntactic Search to Semantic Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Syntactic Search to Semantic Search מחיפוש תחבירי לחיפוש משמעותי אריאל פרנק מחלקה למדעי המחשב אוניברסיטת בר-אילן ariel@cs.biu.ac.il A. Frank

  2. Contents • Web 1.0, Web 2.0, Web 3.0, Web X.0… • Syntactic vs. Semantic Search • Semantic Search Engines • Spectrum of Semantic Search Problems?! A. Frank

  3. “The good, the bad and the …” A. Frank

  4. Semantics in Web 2.0?! , Open Gardens blog, AjitJaokar http://opengardensblog.futuretext.com/archives/2005/12/mobile_Web_20_w.html A. Frank

  5. Web 1.0, Web 2.0, Web 3.0, Web X.0… A. Frank

  6. Where are we headed?! A. Frank

  7. Contents • Web 1.0, Web 2.0, Web 3.0, Web X.0… • Syntactic vs. Semantic Search • Semantic Search Engines • Spectrum of Semantic Search Problems?! A. Frank

  8. Googlism  A. Frank

  9. Google • Presents us with a search box and allows us to enter free form queries. • Basically retrieves documents based on keyword searches. • The search algorithms are based on string/pattern matching, statistical frequencies and Page rank (based on hyperlink popularity analysis). • There is some light-weight semantics – when you search for known objects or public data, Google knows about these types of entities (OneBox) and provides factual answers. A. Frank

  10. Actual Answers to Factual Questions A. Frank

  11. Find Public Data A. Frank

  12. Compare Public Data A. Frank

  13. Current State • The current generation of search engines is severely limited in its understanding of the user's intent and the Web's content. • Semantic search has attracted a lot of attention in the past year, largely due to the growth of the Semantic Web as a whole. • The term “Semantic Search” itself is popular enough to be considered overused/overloaded. • But in general it refers to methods of searching documents beyond the syntactic level of matching keywords. A. Frank

  14. Syntactic vs. Semantic Search • Syntactic search – can match the query against • index of the textual content of the resources • URIs (URLs, URNs) in the system • literals in the RDF metadata • or a combination of these, possibly using: • Exact, prefix or substring match, stemming, minimal edit distance • Semantic search – in addition to syntactic search, can use • index of the meaning of sentences in each resource • semantic knowledge structures and analysis • the graph structure of RDF metadata • or a combination of these, possibly using: • query expansion, classification/categorization, tagging, graph traversal, microformats, RDF & OWL inferencing and reasoning A. Frank

  15. Can Semantic Search answer this :-?) A. Frank

  16. Semantics & NLP • Semantics is a subfield of linguistics that is devoted to the study of meaning, as expressed by words, phrases, sentences, and even larger units of text. • Natural Language Processing (NLP) is a subfield of artificial intelligence and computational linguistics that studies the problems of automated generation and understanding of natural human languages by computers. • Semantic NLP integrates these to provide meaning and understanding within context of computer applications. A. Frank

  17. Semantic Search in General • Collective methods that look at information available beyond the level of individual words or phrases. • Semantic search methods are diverse and cover a range of disciplines • Information Retrieval, Natural Language Processing, Semantic Analysis, the Semantic Web, Databases, Information Extraction, Information Visualization, etc. • An understanding of content at the semantic level also makes it possible to aggregate, compare and reason with content as structured data. • Enables people to type in arbitrarily complex questions and then interpret these queries and execute them. A. Frank

  18. Contents • Web 1.0, Web 2.0, Web 3.0, Web X.0… • Syntactic vs. Semantic Search • Semantic Search Engines • Spectrum of Semantic Search Problems?! A. Frank

  19. Types of “Semantic Search” Engines • Semantic Web SearchEngines • search on the Semantic Web data (RDF, OWL, etc). • Semantically-enhanced Search Engines • mostly augment and refine the displayed results. • NLP-based Search Engines • mainly work on the indexing and query side of the search. • Semantic-NLP-based Search Engines • employ semantic knowledge structures and analysis. • Computational-NLP-based Search Engines • employ an inference engine that enables reasoning and computation. A. Frank

  20. Sample of “Semantic Search” Engines • Semantic Web Search (not covered here) • Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) • Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … • NLP-based Search • MetaWeb Freebase, MS Powerset, … • Semantic-NLP-based Search • hakia, Cognition, (Readware?), … • Computational-NLP-based Search • True Knowledge, (Wolfram Alpha?), … A. Frank

  21. Sample queries used here • Query involves extracting concepts • Artificial Intelligence • Query involves ambiguous term • Jaguar, Cycle • Query involves aggregating data • What did Albert Einstein do • Query involves inference • When was California’s governor born • Query involves computation • What is the distance from San Francisco to Michigan A. Frank

  22. Results of sample queries used here A. Frank

  23. Types of “Semantic Search” Engines • Semantic Web Search (not covered here) • Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) • Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … • NLP-based Search • MetaWeb Freebase, MS Powerset, … • Semantic-NLP-based Search • hakia, Cognition, (Readware?), … • Computational-NLP-based Search • True Knowledge, (Wolfram Alpha?), … A. Frank

  24. Google Orion • Technology that can better understand associations and concepts related to search. • It finds relationships between queries and related concepts and presents those as search refinements: • Pages are scanned in “real-time” by Google after a query is entered. • Conceptually and contextually related sites/pages are then identified and expressed in the form of the improved refinements. • Also provides longer “snippets” (text extracts containing the keywords) when users input queries of three words or more. A. Frank

  25. Orion Search Refinements A. Frank

  26. Yahoo! SearchMonkey • The SearchMonkey platform is mainly based on the Semantic Web approaches. • Supports the ability to modify the presentation of search results through plug-ins to the search interface. • Content providers or third-party developers can create custom data services to extract information from the Web page. • After defining the extracted properties per entry, an enhanced display can be provided. A. Frank

  27. SearchMonkey Enhanced Display A. Frank

  28. Types of “Semantic Search” Engines • Semantic Web Search (not covered here) • Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) • Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … • NLP-based Search • MetaWeb Freebase, MS Powerset, … • Semantic-NLP-based Search • hakia, Cognition, (Readware?), … • Computational-NLP-based Search • True Knowledge, (Wolfram Alpha?), … A. Frank

  29. MetaWeb Freebase • Open, shared database of the world’s knowledge that collects data from the Web to build a massive, collaboratively-edited database of cross-linked data. • Free for anyone to query, contribute to, build applications on top of, or integrate into their Web sites. • Focus is on organizing and managing complex data structures by use of Semantic Web technologies. • Enables extraction of ordered knowledge out of the information chaos that is the current Web. A. Frank

  30. Freebase Interface A. Frank

  31. Freebase Repository • Covers millions of topics in hundreds of categories. • Freebase contains structured information on many popular topics, like movies, music, people and locations – all reconciled and freely available. • A base is a new kind of Website that anyone can build using Freebase; It is a place to organize and share collections of information about a particular subject. • Freebase has two search interfaces: • Free form queries; a text search box. • MQL (MetaWeb Query Language) queries. A. Frank

  32. Freebase Structure • Freebase spans domains, but requires that a particular topic exist only once, even if it might normally be found in multiple bases. • For example, Arnold Schwarzenegger would appear in a movie base as an actor, a political base as a governor and a bodybuilder base as a Mr. Universe. • In Freebase, there is only one topic for Arnold Schwarzenegger, with all three facets of his public persona brought together. A. Frank

  33. Freebase vs. Wikipedia • The difference lies in the way they store information. • Wikipedia arranges information in the form of articles. • Freebase lists facts and statistics. • Its list form is good not only for people who like to glance at facts, but also for people who want to use the data to build other Web sites/software. • Topics covered by Freebase include subjects that are too obscure for Wikipedia, which strives for notability appropriate to an encyclopedia.   A. Frank

  34. Query involves extracting concepts A. Frank

  35. Query involves ambiguous term A. Frank

  36. Query involves aggregating data (1) A. Frank

  37. Query involves aggregating data (2) A. Frank

  38. Query involves inference A. Frank

  39. Query involves computation A. Frank

  40. Microsoft Powerset • Provides a NLP-based search engine. • Powerset reads and understands every sentence on a Web page and allows asking questions in plain English. • Powerset search and discovery experience is currently based on Wikipedia and Freebase. • In large, Powerset is trying to match the meaning of a query with the meaning of sentences in Wikipedia or facts in Freebase. A. Frank

  41. Powerset Interface A. Frank

  42. Powerset Services • In the search box, you can express yourself in keywords, phrases, or simple questions. • On the search results page, Powerset gives more accurate results, often answering questions directly, and aggregating information from across multiple articles. • Powerset's technology follows you into enhanced Wikipedia articles, giving you a better way to quickly digest and navigate content. A. Frank

  43. Powerset Factz • Factz are concise representations of information extracted from sentences. • They are represented in three parts: the subject, relation and object (e.g., Oswald shot JFK). • Factz do not always represent truth, but rather propositions that are asserted in the text of Wikipedia. • On the search results page, you will often see Factz for general topic queries; these Factz are collected from pages across Wikipedia. • On a particular topic page, you can see the Factz extracted from the given page in the outline. A. Frank

  44. Snippet of a Powerset Result Page A. Frank

  45. Query involves extracting concepts A. Frank

  46. Query involves ambiguous term A. Frank

  47. Query involves aggregating data A. Frank

  48. Query involves inference A. Frank

  49. Query involves computation A. Frank

  50. Types of “Semantic Search” Engines • Semantic Web Search (not covered here) • Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) • Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … • NLP-based Search • MetaWeb Freebase, MS Powerset, … • Semantic-NLP-based Search • hakia, Cognition, (Readware?), … • Computational-NLP-based Search • True Knowledge, (Wolfram Alpha?), … A. Frank

More Related