1 / 50

Knowledge Base Tuning

Explore the key ideas, configuration, and tuning of knowledge bases. Learn about search technologies, data mining approaches, and applications aiding information retrieval. Discover different search architectures, filter-based queries, and index-based searches for efficient data access.

elupo
Télécharger la présentation

Knowledge Base Tuning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Base Tuning September 24, 2008 Zuzana Gedeon – Research Labs

  2. Overview • Key Ideas • KB – what do we/you mean by KB • Searching • Background on Search Engine Technology • KB Configuration & Tuning

  3. KB in general terms • Knowledge base • Information available to the user • Data mining – making KB available • Applications helping user to access this information

  4. Answer database External documents Community Forum Who uses the Knowledge Base? End users Internal database Managers CSRs Marketers Subject experts

  5. Types of search - architecture • Filter based Direct database query – built into views engine • product/category filtering • Date, customer email address, … Most runtime selectable filters in Reports • Text/Index based “Google style” search Documents -> index • Boosting and weight calculation • KB Browse • Navigational, exploratory search • No Search !! • Get what you need without need for search

  6. Mashup • Report filters with index based search • Incident search • Answer search pages Filter > search_thread (search_xxx) Sort by match_wt!!!

  7. Types of search - architecture • Filter based Direct database query – built into views engine • product/category filtering • Date, customer email address, … Most runtime selectable filters in Reports • Text/Index based “Google style” search Documents -> index • Boosting and weight calculation • KB Browse • Navigational, exploratory search • No Search !! • Get what you need without need for search

  8. EU Knowledge sources and delivery Syndication widget Voice KB Search KB Browse External documents Answer database Community Forum Pro Services integration

  9. No Search !! Fact: A large percentage of user sessions do NOT do a search Users find what they are looking for without any search just by showing them the right stuff as soon as they access page.

  10. How do we do that? • Good content • Well-chosen category and product organization • Good descriptive titles • Concise information (generic vs. specific) • Consistency • Administrator • Topic/Add words • User specifiable content tags to start/stop indexing for searching • Answer as a file attachment or URL versus just Q&A pair • SmartGuide to create branching (script-like) Answers • Publish-on and review-on dates • Place on top (“fix on top” really sparingly) • Answer access level conditional sections • Users • Users ranking helpfulness - explicitly • Ants leaving pheromone trail – implicit ranking

  11. Find information where they search • Sitemap: exporting KB to search engines • What are Sitemaps? Sitemaps are an easy way for webmasters to inform search engines such as Google and Yahoo about pages on their sites that are available for crawling. • Sitemap Feature Description: Facilitates Google’s (and other search engine’s) spidering of your public RightNow knowledgebase content. • Benefits: • Allows you to control how search engine spiders visit and consume your knowledgebase content. • If you desire, this can help your content go to the front of the line in Google/Yahoo web spiders.

  12. Information placement Knowledge Syndication Widget with Product filter

  13. How do we do that? • Good content • Well-chosen category and product organization • Good descriptive titles • Concise information (generic vs. specific) • Consistency

  14. How do we do that? Administrator • Topic/Add words • User specifiable content tags to start/stop indexing for searching • Answer as a file attachment or URL versus just Q&A pair • Publish-on and review-on dates • Answer access level conditional sections • Place on top (“fix on top” really sparingly)

  15. Topic Words for Search • Allows KB administrator to associate either a WWW document or KB Answer to a specific single search term • The given document appears first in the list of search results • Document can be set to always be shown • Useful for directed information presentation, advertising, notices, announcements, etc.

  16. How do we do that? Administrator • Topic/Add words • User specifiable content tags to start/stop indexing for searching • Answer as a file attachment or URL versus just Q&A pair • Publish-on and review-on dates • Answer access level conditional sections • Place on top (“fix on top” really sparingly)

  17. Stop/start index This text is being indexed <!--stopindex--> this text is not being indexed <!--startindex--> And this text is again indexed

  18. How do we do that? Administrator • Topic/Add words • User specifiable content tags to start/stop indexing for searching • Answer as a file attachment or URL versus just Q&A pair • Publish-on and review-on dates • Answer access level conditional sections • Place on top (“fix on top” really sparingly)

  19. How do we do that? • Users • Users ranking helpfulness - explicitly • Ants leaving pheromone trail – implicit ranking • AI • aging of the information • agedatabase • Administrator • Promoting new answers

  20. No Search !! Users find what they are looking for without any search just by showing them the right stuff as soon as they access page.

  21. Users + AI Common-> knowledge base -> Answer search: SA_SOLVED_WEIGH_PREF – long term or short term preference

  22. Smart Assistant

  23. Smart Assistant

  24. Relationships Between Answers Sibling Answers section must be enabled from workspace property Can manually relate answers together

  25. Use Smart Assistant Help in populating KB – respond to customer inquiries – propose new answers Set up Smart Assistant Rules Try to answer the question without admin interaction

  26. Smart Assistant tuning Limit by matching Browse topics RNT UI → Support → SA_NL_MATCH_THRESHOLD Enables the ability to restrict SmartAssistant suggested answers to answers that have the same or closely matching locations in the browse tree. The accepted values are: 0 - do not restrict, 1 - use answers from any closely matching clusters, and 2 - use only best matching clusters. If SA_DM_FREQ is set to 0, the value of SA_NL_MATCH_THRESHOLD will be forced to 0 regardless of the value set here. Default is 1.

  27. Suggested Searches • Using history of end-user searches we use a data-mining technique to establish relationships between similar search phrases • EU_SUGGESTED_SEARCHES_ENABLE • Each search phrase suggested to an end-user must pass these tests • Each word spelled correctly • Positive SmartSense value • No words in blacklist • Be complementary to current search • SEARCH_SUGGESTIONS_DISPLAY • 0 no recommendations • 1 turn on recommended products • 2 turn on recommended categories • 4 turn on recommended Browse topics • MAX_SEARCH_SUGGESTIONS

  28. Web Like Search Traditional keyword searching on the internet or within an operating system. User’s mental model (Google, Yahoo, MSN) Attributes of Search Indexes the ‘entire’ corpus of information. Almost never results in a zero matches. User Testing in Jan 08 showed that Google is expected behavior whenever the term ‘Search’ is paced next to a text box on the web.

  29. Answer Search

  30. External documents search • Web pages • Answers

  31. What’s an Index? • The index is where all the information about what is searchable is stored • Indexes are used to speed finding search results by not requiring each document to be scanned during the search process • Most search engines (including ours) use an ‘inverted index’ which means that they map words to documents, or words to locations within documents - Similar to the index in the back of a book • Vs “find a word with your finger” • Indexes are pre-computed when documents are created/edited

  32. Four score and seven years ago our fathers brought forth on this continent, conceived in Liberty, and Score: A group of 20 items. Hence, four score is 4x20, or 80. Word Location liberty mexico north restriction Liberty: The condition of being free from restriction or control. The North American Continent consists of the countries: the United States of America,Canada, Mexico, score seven states united years Example of an Index Index

  33. Stopwords and Word Stemming • Stopwords are human-language connector words that are not generally useful in information retrieval • a, an , the, or, on , for, … • “To be or not to be” • RightNow Feature: multiple editable stop word lists • Incidents • Answers • Word Stemming • Standard natural language processing technique • Unique stemmer for each language • CONNECT CONNECTED CONNECTING CONNECTION CONNECTIONS => CONNECT - Generalizes searches (exact matches not considered)

  34. Query Processing and Result Ranking • How does a search query work? • Query is processed via word stemming and removal of stopwords • Aliases are added to the search terms (non stopwords, original form) • Search terms are looked up in the index • The total hits are gathered and sorted by document via weighting formula(s) • The documents’ attributes (title, link, etc.) are fetched and returned to the browser • postprocessing algorithm may be used before display

  35. Answer Search

  36. Word Bias Configuration • Some words are relatively more important than others based upon location • Words in the Subject & Keywords field are more important than words in the body of a document or the attachments RightNow Configuration Options SRCH_KEY_WEIGHT 50 Keywords SRCH_PROD_WEIGHT 50 Product Words SRCH_CAT_WEIGHT 50 Category Words SRCH_SUBJ_WEIGHT 45 Subject/Title Words SRCH_DESC_WEIGHT 30 Question Words SRCH_BODY_WEIGHT 4 Answer Words SRCH_ATTACH_WEIGHT 4 File-Attach. Words Set these to be the same across interfaces!

  37. AND vs. OR Query Processing • Do the search results contain ALL words in the search text or just SOME words? • All major Internet search engines use AND • We use OR by default with a heavy multi-word weight bias .. “AND-like ordering” • Why do we use OR? AND does not work well for small documents sets (under 10,000 answers). • Why does AND perform badly on small document sets? It’s too easy for a user to construct a query with no search results.

  38. Result Focusing and Truncation • Dynamic Truncation Bias (Answers) • Truncate Search Results to those most scoring best • RNT UI: SEARCH_RESULT_LIMITING – natural breaks • RNT UI: ANS_SRCH_THRESHOLD – break by weight • RNT UI: ANS_SRCH_SUB_THRESHOLD – avoid 0 results • Concept-biased Search • Focus Search results based upon matching of query to existing KB learned topics • RNT UI: SEARCH_RELEVANCE_FOCUS (Answers) • RNT UI: SA_NL_MATCH_THRESHOLD(SmartAssistant)

  39. External documents search • Web pages • Answers

  40. External documents and tuning • No much of content control • spider • uses only externally available content • Not much tuning control • Title and body weight • SRCH_KEY_WEIGHT Meta+ products, categories • SRCH_SUBJ_WEIGHT Title • SRCH_DESC_WEIGHT Text • HtDig with Clucene • File Attachment Size • FATTACH_MAX_SIZE Core Engine • Search Pulldowns – Kill them • ANS_SEARCH_BY_ENABLED • ANS_SORT_BY_ENABLED

  41. Important Files in the File Manager

  42. Wizard exclude_answers.txt

  43. Aliases Establishes a link between two words to treat them as synonyms for every search type • FBI = Federal Bureau of Investigation • Whiskey = Scotch

  44. Analytics • Keyword Searches report • Frequent searches (important content) • Searches with no answers (missing content) • Searches with too many answers (configuration and tuning needed) • Gap report

  45. Keyword Searches Report

  46. Information Gap Report • Use the Gap Report to identify ‘holes’ in the end-user KB. • Compares recent incidents to existing Answers. • Gap Report Config Options: GAP_FREQUENCY & GAP_TIME_PERIOD – default 7 days for both.

  47. Information Gap Report Screenshot

  48. Other Customization •  EU_BROWSER_SEARCH_PLUGIN - Enables the Answer and External Document search pages to provide an interface for web browsers to query them directly from their built-in search bars, such as those provided by Google or Yahoo!. Default is disabled (No). •  EU_SYNDICATION_ENABLE – widgets •  ANS_SORT_BY_ENABLED Enables the Sort By drop-down menu on the Find Answers page. This setting overrides any view settings. Default is disabled (No). – this is the reason to have limited results set!!!! •  SEARCH_WITH_OPERATORS Enables processing of +, - and ~ operators while searching for answers. Default is enabled (Yes).

  49. Thank You • Questions?

More Related