1 / 69

Social Search

2011-05-11. Chapter 10. Social Search. Borim Ryu. Contents. What Is Social Search?. Ⅰ. User Tags and Manual Indexing. Ⅱ. Searching with Communities. Ⅲ. Filtering and Recommending. Ⅳ. Peer-to-Peer and Metasearch. Ⅴ. 10.1 What Is Social Search?. Social search Definitions:

lavender
Télécharger la présentation

Social Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2011-05-11 Chapter 10. Social Search BorimRyu

  2. Contents What Is Social Search? Ⅰ User Tags and Manual Indexing Ⅱ Searching with Communities Ⅲ Filtering and Recommending Ⅳ Peer-to-Peer and Metasearch Ⅴ

  3. 10.1 What Is Social Search? • Social search • Definitions: • Search within a social environment • Communities of users actively participating in the search process • Stark contrast to the standard search interactions • Differences: • Users interact with the system • Users interact with each other in online either implicitly or explicitly

  4. 10.1 What Is Social Search? • Social media sites & Web 2.0 • Web 2.0: Opposed meaning to the classical Web • Interactive HTML documents • Users can tag their own and other’s content • Users can share favorites, tags with others • Twitter(status messages), Flickr(pictures), YouTube(videos), CiteULike(research papers), and other Social networking sites (MySpace, Facebook, ….) • Including online social interactions such as email, instant messenger, online games, and Blogs)

  5. 10.1 What Is Social Search? • Social search Topics • User tags • Searching within communities • Adaptive filtering • Recommender systems • Peer-to-peer and metasearch

  6. 10.2 User Tags and Manual Indexing • Example: Library catalogs • Experts generate indexing terms manually • Terms have high quality • Terms chosen from controlled vocabulary • ‘Social tagging’ • Social media sites provide users with the opportunity to manually tag items • Users generate tags • Tags can be noisy or even incorrect • Tags often contain folksonomies Folksonomy * folk+order+nomous의 합성어로 ‘사람들에 의한 분류법’이라는 뜻이다. * 웹페이지에 올라와 있는 정보나 관련 주제를 고전적인 분류기반의 디렉토리로 나누는 것이 아니라, 키워드에 따라 구분하는 새로운 분류체계를 의미한다. * 이러한 키워드(꼬리표)는 사람들이 직접 만들수도 있고, 인공지능 엔진이 자동으로 생성할 수도 있다. * 폭소노미는 웹2.0세대가 추구하는 네트워크 지향 웹 형성에 있어 가장 기초적인 역할을 수행하게 될 것이다.

  7. 10.2 User Tags and Manual Indexing Types of user tags Content-based tags: Tags describing the content of an item. “car”, “woman”, “sky” Context-based tags: Tags describing the context of an item. “New York City”, “Empire State Building” Attribute tags: Tags describing implicit attributes of the item. “Nikon”(type of camera), “Black and White”(name of movie), “Homepage”(type of web page) Subjective tags: Tags that subjectively describe an item. “pretty”, “amazing”, “awesome” Organizational tags: Tags that help organize items. “todo”, “my pictures”, “readme”

  8. 10.2 User Tags and Manual Indexing 10.2.1 Searching Tags Searching user tags is challenging - Most items have only a few tags - Tag length is very short - No overlap between the query terms and the tag terms Vocabulary mismatch problem (between query and tag) There are various ways to overcome this problem.  Tag expansion

  9. 10.2 User Tags and Manual Indexing 10.2.1 Searching Tags Tag expansion: Expanding tag representation with external knowledge - Thesaurus, Web search results, Query logs - After tags have been expanded, standard retrieval models can be used.

  10. 10.2 User Tags and Manual Indexing 10.2.1 Searching Tags Tag expansion Using Search Results

  11. 10.2 User Tags and Manual Indexing • 10.2.1 Searching Tags • Even with tag expansion, searching tag is challenging • Tags are noisy and incorrect • Many items may not even be tagged • Typically easier to find popular items with many tags than less popular items with few/no tags

  12. 10.2 User Tags and Manual Indexing • 10.2.2 Inferring missing Tags • How can we automatically tag items with few or no tags? • Uses of inferred tags: • - Improved tag search • - Automatic tag suggestion

  13. 10.2 User Tags and Manual Indexing • 10.2.2 Inferring missing Tags • Textual items (books, news articles, research papers..) • Simple approach  Computing some weight for every term that occurs in the text and choosing K terms with highest weight as the inferred tags • 1. TF·IDF based weight: • Suggest tags that have high TF.IDF weight in the item, only works for textual items. • wt(w) = log(fw,D + 1) x log (N /dfw)

  14. 10.2 User Tags and Manual Indexing 10.2.2 Inferring missing Tags 2. Classification method: - Train binary classifier for each tag. - These classifiers take an item as input and predicts whether the associated tag should be applied to the item. - This approach requires training one classifier for every tag, which can be a cumbersome task and requires a large amount of training data - Performs well for popular tags, but not as well for rare tags.

  15. 10.2 User Tags and Manual Indexing • 10.2.2 Inferring missing Tags • 3. Maximal Marginal Relevance(MMR): • Addresses the problem of selecting a diverse set of items. • Finds tags that are relevant to the item and novel with respect to existing tags. • Given an item i and the current set of tags for the item Ti

  16. 10.2 User Tags and Manual Indexing • 10.2.3 Browsing and Tag Clouds • Tags can be used to help users browse, explore, and discover new items in large collection of items • Browsing is useful for exploring collections of tagged items • Various ways to visualize collections of tags • Tag lists • Tag clouds • Alphabetical order • Grouped by category • Sorted according to popularity

  17. 10.2 User Tags and Manual Indexing 10.2.3 Browsing and Tag Clouds

  18. 10.3 Searching with Communities • 10.3.1 What Is a Community? • Online community – Groups of entities that interact in an online environment and share common goals, traits, or interests • Ex) Baseball communities, DSLR communities • Some communities consist of non-human entities • Ex) A set of web pages form web community

  19. 10.3 Searching with Communities • 10.3.2 Finding Communities • Entities within a community are similar to each other • Several algorithms have been developed that can find effectively special types of communities • Can represent interactions between a set of entities as a graph • - Each entity is a node in the graph, and interactions (relationships) between the entities are denoted by edges

  20. 10.3 Searching with Communities • 10.3.2 Finding Communities • Graphs can be either directed or undirected. • Directed graph • Edges have directional arrows indicate the source node and destination node of the edge. • Used for representing non-symmetric or causal relationships between two entities. • Undirected graph • Edges do not have arrows, no notion of source and destination. • Used for representing symmetric relationships or for simply indicating two entities.

  21. 10.3 Searching with Communities • 10.3.2 Finding Communities • Two criteria for finding communities within the graph • The subset of entities (nodes) must be similar to each other according to some similarity measure. • The subset of entities should interact with each other more than they interact with other entities.

  22. 10.3 Searching with Communities • 10.3.2 Finding Communities • Graph Representation Example of how nodes within a directed graph can be represented as vectors. For a given node p, its vector representation has component q set to 1 if p  q.

  23. 10.3 Searching with Communities • 10.3.2 Finding Communities • Hyperlink-induced Topic Search (HITS) algorithm can be used to find communities • Link analysis algorithm, link PageRank • Each entity has a hub and authority score • Candidate entities: a subset of the entities that may possibly be members of the community • Based on a circularity assumptions • Good hubs point to good authorities • Good authorities are pointed by good hubs

  24. 10.3 Searching with Communities • 10.3.2 Finding Communities • Hyperlink-induced Topic Search (HITS) algorithm • Takes a graph G with node set V and edge set E as input • Vertex set V consists of the candidate entities, edge set E consists of all of the edges between candidate entities • For each of the candidate entities (nodes) p in the graph, HITS computes an authority score A(p) and a hub score H(p) • Iterative algorithm:

  25. 10.3 Searching with Communities

  26. 10.3 Searching with Communities HIT Example

  27. 10.3 Searching with Communities • 10.3.2 Finding Communities • Hyperlink-induced Topic Search (HITS) algorithm • Nodes with many incoming edges tend to have higher authority scores, and those with more outgoing edges tend to have larger hub scores. • Once the hub and authority scores have been computed, the entities can be ranked according to their authority score. • Entities with large authority scores are likely to be the “leaders” or “core” of the community.

  28. 10.3 Searching with Communities • 10.3.3 Community-Based Question Answering • Some complex information needs can’t be answered by traditional search engines • Information from multiple sources • Human experts • Community-based question answering (COA) tries to overcome these limitations • Searcher enters questions • Community members answer the question • EX) Yahoo! Answers, Naver지식INservice

  29. 10.3 Searching with Communities 10.3.3 Community-Based Question Answering Example questions:

  30. 10.3 Searching with Communities • 10.3.3 Community-Based Question Answering • Pros • Can find answers to complex questions • Answers are from humans, not computer algorithms • Can search archive of previous Q/As • Cons • Often takes time to get a good response • Some questions never get answered • Answers may be wrong • The quality of answer can be lower than the search results

  31. 10.3 Searching with Communities • 10.3.3 Community-Based Question Answering • How can we effectively search an archive of question and answer pairs? • Can be treated as a translation problem • Translate a question into a related question • Translate a question into an answer • Translation‐based language model: • Enhanced translation model:

  32. 10.3 Searching with Communities • 10.3.3 Community-Based Question Answering • How can to compute translation probabilities? • In cross-language retrieval, translation probabilities are learned from a parallel corpus • The notion of a parallel corpus becomes hazy when dealing with inter‐language translations • A variety of approaches have been used for estimating translation probabilities within the same language • Various tools exist for computing translation probabilities from a parallel corpus

  33. Contents What Is Social Search? Ⅰ User Tags and Manual Indexing Ⅱ Searching with Communities Ⅲ Filtering and Recommending Ⅳ Peer-to-Peer and Metasearch Ⅴ

  34. 10.3 Searching with Communities • 10.3.4 Collaborative Searching • Traditional search assumes single searcher • Collaborative search: A group of users with a common goal searching together in a collaborative setting • EX) Student doing research for a history report • A collaborative search system would allow students to search the Web and other resources together, so that every member of the group could contribute and understand every subtopic of the report • Collecting information about various aspects of a particular project

  35. 10.3 Searching with Communities • 10.3.4 Collaborative Searching • Two common types of collaborative search settings: • Depending on where the search participants are physically located with each other • CoSearch system • Developed by Amershi & Morris (2008) • System has a primary display, keyboard, and mouse controlled by the person called “driver”, who leads the search task • Additional participants, called “observers” each have a mouse or a Bluetooth-enabled mobile phone. • Observers may click on search results, which adds the corresponding page into a shared “page queue”: Every participants recommend which page should be navigated to next. • “query queue”: Potentially useful queries • Co-located • Participants in same location • CoSearch system • Remote • Participants in different locations • SearchTogether system

  36. 10.3 Searching with Communities • 10.3.4 Collaborative Searching • Challenges? • How do users interact with system? • How do users interact with other users? • How is data shared? • What data persists across sessions? • Very few commercial collaborative search systems • Likely to see more of this type of system in the future

  37. 10.4 Filtering and Recommending • 10.4.1 Document Filtering • Ad hoc retrieval • Document collections & information needs change with time • Results returned when query is entered • Document filtering • Document collections change with time, but information needs are static (user profile) • Documents entering system that match the profile are delivered to the user via a push mechanism

  38. 10.4 Filtering and Recommending • 10.4.1 Document Filtering • One part of social search application is representing individual users’ interests and preferences. • Filtering provides a way of personalizing the search experience by maintaining a number of long-term information needs. • Two key components: • The user’s long-term information needs, profile for information need • Given a document has just arrived in the system, a decision mechanism must be devised for identifying which are the relevant profiles for document

  39. 10.4 Filtering and Recommending • Profiles • Represents long term information needs • May be simple as a Boolean or keyword query • Contains document sets of relevant & non-relevant • May have relational constrains (“published before 1990”, “price in the $10-$25 range” • Actual representation usually depends on underlying filtering model • Static models: User’s profile does not change over time • Adaptive models: User’s profile is constantly changing over time

  40. 10.4 Filtering and Recommending • Static filtering models • Given a fixed profile, how can we determine if incoming document should be delivered? • Treat as IR problem: • Boolean, Vector space, Language model • When a new document enters the system, the filtering system must decide whether or not it is relevant with respect to each profile. • As new documents arrive, they are compared to each profile. Arrows from a document to a profile indicate that document was deemed to relevant to the profile and returned to user.

  41. 10.4 Filtering and Recommending • Static filtering with Language model • Assume profile consists of K relevant documents, Ti are the pieces of text(queries, documents) make up the profile, each with weight αi , profile language model P can be: • Then, given an incoming document, a document language model D must be estimated. • Documents can then be ranked according to the negative KL-divergence between the profile language model (P) and the document language model (D) • If –KL(P||D) ≥ θ, then deliver D to P, threshold (θ) can be optimized for some metric

  42. 10.4 Filtering and Recommending • Adaptive filtering models • In adaptive filtering, profiles are dynamic, can be updated • How can profiles change? • User can explicitly update the profile, provide relevance feedback about the documents delivered to the profile • Done automatically based on click, or browsing patterns • When a document is delivered to a profile, the user provides feedback about the document, and the profile is then updated and used for matching future incoming documents.

  43. 10.4 Filtering and Recommending • Adaptive filtering models • Rocchio algorithm: Profiles treated as vectors • Given a profile P, a set of non-relevant feedback documents (Nonrel), a set of relevant feedback documents (Rel), the adapted profile P’ is: • Relevance-based language models: Profiles treated as language models • C is the set of documents in the collection, Rel is the set of documents that have been judged relevant, Diis documenti, and P(Di|D) is probability that document Di is generated from document D’s language model.

  44. 10.4 Filtering and Recommending Summary of Filtering models

  45. 10.4 Filtering and Recommending • Fast filtering with millions of profiles • There may be thousands or possibly even millions of profiles that must be matched against incoming documents. • Standard information retrieval indexing and query evaluation strategies can be applied to perform this matching effectively. • How to efficiently filter in such a system? • Most profiles are represented as text or a set of features • Build an index for the profiles • Treat incoming documents as “queries” and run against index

  46. 10.4 Filtering and Recommending • Evaluation of Filtering models • Filtering systems do not produce a ranking list of document for each profile. Instead, relevant documents are simply delivered to the profile as they arrive. • Therefore, the measures such as precision at K and MAP are not appropriate for the task. Instead, set-based measures are typically used. • It is possible to define a more general evaluation metric combines each of the four cells. • α= 2, β= 0, δ= ‐1, and γ= 0 is widely used.

  47. 10.4 Filtering and Recommending • 10.4.2 Collaborative Filtering • In static and adaptive filtering, users and their profiles are assumed to be independent of each other. • Similar users are likely to have similar preferences. • Collaborative filtering exploits relationships between users to improve how items (documents) are matched to users (profiles). • Collaborative filtering is often used as a component of recommender systems.

  48. 10.4 Filtering and Recommending • 10.4.2 Collaborative Filtering • Recommender systems use collaborative filtering algorithms to recommend items (mainly products, books, or movies) to users. • Many commercial sites, like Amazon.com or NetFlix, make heavy use of recommender systems to provide users with a list of products. • Recommend system is valuable both to the end users, and to search engine companies.

  49. 10.4 Filtering and Recommending • 10.4.2 Collaborative Filtering • Recommender systems algorithms: • Input • (User, Item, Rating) tuples for items that the user has explicitly rated • Typically represented as a user-item matrix • Output • (User, Item, Rating) tuples for items that the user has not rated • Can be thought of as filling in the missing entries of the user-item matrix • Most algorithms infer missing ratings based on the ratings of similar users.

  50. 10.4 Filtering and Recommending Recommender Systems

More Related