1 / 31

Social Search and Discovery Using a Unified Approach

Social Search and Discovery Using a Unified Approach. Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar IDB Tagging Team, School of CSE, SNU Presented by Kangpyo Lee. A Variety of Web Search Types. Social Search

thalia
Télécharger la présentation

Social Search and Discovery Using a Unified Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Social Search and Discovery Using a Unified Approach EinatAmitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar IDB Tagging Team, School of CSE, SNU Presented by Kangpyo Lee

  2. A Variety of Web Search Types Social Search Personalized Search Exploratory Search Unified Search Universal Search Multi-entity Search Vertical Search Faceted Search Multi-faceted Search

  3. Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary

  4. Introduction • Recent Web 2.0 applications (e.g., web logs, collaborative bookmarking systems, and social networks) introduce new entities & relations in addition to regular web pages • Web 2.0 entities relate to each other in several ways • Documents may relate to other documents by referencing each other • A user may relate to a document through authorship relation, as a tagger, as an author, or as mentioned in the page’s content • A user may relate to other users through social relations • A tag relates to the bookmark it is associated with, and also to the tagger • These entities & relations may prove valuable in enhancing the search experience • By serving as potential search results • By influencing ranking algorithms

  5. Introduction • We present and evaluate novel methods for leveraging social information to enhance search results and discover relations between Web 2.0 applications • Our approach leverages a unified representation of the entities and their relations • We then use this intricate heterogeneous collection to establish an all-encompassing social search solution

  6. Introduction • Social search solution • Allows users to query for specific entities and retrieve results of all relevant types • The system returns, in addition to standard search results, users related to the query, as well as tags that are associated with relevant documents • These tags can be further used to categorize the search results and to better refine the searcher’s information need • We use the term social search engine to describe this multi-entity search system based on “social” data • Our social search system is the only one that provides a unified approach for searching and retrieving entities of all types

  7. Introduction- Unified Approach • Our social data include records of users’ public activity with documents • such as bookmarking, tagging, rating, or comments made to other public Web 2.0 entities • Our system allows the search for any object type (e.g., documents, person, or tag) and the retrieval of all entity types • The system supports • Standard textual queries • Entity queries • Any combination of the two

  8. Introduction- Unified Approach • The social search engine is based on the unified search approach • Unified search • A.k.a. heterogeneous interrelated entity search • An emerging paradigm within IR • The search space is expanded to represent heterogeneous information about objects that may relate to each other in several ways • Direct relations • Indirect relations • The system must be scalable, responsive, and reflect the rapid update patterns typical in Web 2.0 systems

  9. Introduction- Unified Approach • We present a novel realization of unified search paradigm based on multifaceted search • Represents each of the system’s entities by a retrievable document • Direct relations between entities are represented by marking one of the elements as a “facet” of its counterpart • The strength of the relationship between the two objects is represented by the strength of document-facet relationship A Direct Relation B • A is one of B’s facets • B is one of A’s facets

  10. Introduction- Unified Approach • An efficient mechanism for updating relations between objects as well as efficient search over the heterogeneous data • Only direct relations between objects need to be updated when new entities are added • Indirect relations are dynamically induced from the direct relations and computed on-the-fly during query execution time • Directly-related objects are retrieved and scored during run-time using the search engine’s regular scoring mechanisms • Indirectly-related entities are retrieved and scored using an implementation of faceted search

  11. Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary

  12. Related Work • Social search • The set of annotations provided by the public can be used to enrich the page content • The # of annotations of a web page can be used as additional evidence of document quality for improved ranking of search results • Social data enables users to search for other people with whom thy maintain relationships in the network • Social ranking • Ranking all entities retrieved by the social search engine • FolkRank and SocialPageRank • Applying PageRank-like computation depends heavily on the graph size and is expected to be very slow • Different entity types provide different retrieval values for the searcher, hence they should be ranked according to their own characteristics

  13. Related Work- Multi-Entity Search • Multi-entity search • Extending basic search functionality by answering user queries with many types of entities • Usually based on analysis of the relationship between entities and documents relevant to the query • Searching over a multi-entity graph • Nodes are entities (terms, documents, persons, annotations) • Edges are the relations between the entities • SimFusion uses a Unified Relationship Matrix (URM) to represent the multi-entity graph

  14. Related Work- Multi-Entity Search • Unified Relationship Matrix (URM) • Relations between two object types are represented via a relationship matrix Mij • The (k, l) entry of matrix Mij represents the strength of the relation between the object pairs (ok, ol) of types Oi and Oj respectively • The URM matrix U • Encapsulates all matrices to provide a unified representation of the unified search space • Provides relationship strength between any two directly related entities, along with a theoretically elegant way to calculate indirect relations through matrix multiplication

  15. Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary

  16. Implementation • Our solution to unified search represents each object in the system in two ways • (1) as a retrievable document • (2) as a facet (category) of all the objects to which it relates • A unified representation of a collaborative bookmarking system • Three object types – web pages, users, and tags • Each object type is associated with a corresponding document – a web page document, a user document, and a tag document • Three relationship types • A user-type facet between a user & the tagged web page • A tag-type facet between a tag & the associated web page • A user-type facet between a user & a tag used for bookmarking

  17. Implementation- Scoring Indirectly Related Objects • The strength of the indirect relation between object o1 & o2 • U(o, o’) – the corresponding entry in the URM matrix • Equivalent to squaring the URM matrix • Provides the relationship strength of order two between any two objects • Eq. 1 can be generalized to score objects based on their indirect relations with any query • The score vector s0(q) provides the direct scores of all N objects in the system to the query • The score vector s1(q) provides the indirect scores of all objects

  18. Implementation- Scoring Indirectly Related Objects • In addition, objects can be scored according to their relative popularity, or authority • FolkRank or SocialPageRank can be used • Inverse entity frequency (ief) score • N – the # of all objects in the system • No – the # of objects directly related to o • Penalizes objects that are related to many objects in general • The final score of object o for a query q

  19. Implementation- Multifaceted Search • Multifaceted search aims to combine the two main search approaches: • Direct search • Navigational search – offering navigational refinement on the results by categorizing the search results into predefined facets along with the counts of results per facet • Multifaceted search has become the prevailing user interaction mechanism in e-commerce sites • Now being extended to deal with semi-structured data, continuous dimensions, and folksonomies

  20. Implementation- Multifaceted Search • The scores of directly related objects are equivalent to the scores as represented by s0(q) • The score of an indirectly related object, o, is computed by aggregating its relationship strength with all matching documents, multiplied by their direct score • w(o, oi) – the relationship strength between the document oi & its facet o • Equivalent to Eq. 2 since w(o, oi) = U(o, oi) • Indirectly related objects are represented by accumulating all facets of the same type

  21. Implementation- Efficiency Factors • Two issues regarding use of the URM matrix for social search • 1) the need for efficient computation of indirect relations • 2) efficient dynamic updates • The universal query (q = ‘*’) that retrieves all the objects, indexed by the system as well as all objects related to them, has a query runtime of less than four seconds • Dynamic updates are handled by a mechanism that is implemented by storing the changes in an external databases

  22. Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary

  23. Social Search within the Enterprise Textual Query Entity Query

  24. Social Search within the Enterprise- Social Data & Social Search Application • Web 2.0 services of IBM • Dogear – a collaborative bookmarking service (373,821 bookmarks, 234,856 web pages) • BlogCentral – a central blog service (77,930 blog threads) • BluePages – the enterprise directory and employee profile application (15,779 IBMers) • About 700,000 unique entities • Cow Search – the social search application available to all users of IBM’s intranet

  25. Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary

  26. User Study • Our goalwas to measure both the quality of the returned document set and the related users and tags • The evaluation methodologies for documents are well known and have standard measures • There are no standard ways of measuring the quality of related users of tags • A user study was thus used • The retrieved documents were examined and marked with three relevance levels (0-not relevant, 1-marginally relevant, 2-highly relevant) • The quality of search results was measured by the normalized discount cumulative gain (NDCG) measure • To evaluate the effectiveness of the related people, we emailed and asked the 612 random users to rate on a Likert scale of 1 to 5

  27. User Study- Results • Social data contribution to enterprise search • We measure the quality of search results using manual assessments of the top-k search results for the 50 chosen queries

  28. User Study- Results • Related users • Related tags

  29. Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary

  30. Summary • Social data is valuable • 1. The high precision of top retrieved documents demonstrate that user feedback identifies high quality content in the corpus • 2. User comments and tags are highly beneficial in general and augment the description of system entities, while providing additional evidence for object popularity • Future research • Exploiting personal social networks for search personalization • Documents or tags recommendations • Quantifying the contribution of social objects to the effectiveness of the search system

  31. Thank You! Any Questions or Comments?

More Related