1 / 11

Advanced Indexing and Searching Techniques in Modern Information Retrieval

This chapter delves into various methodologies for indexing and searching within information retrieval systems. It explores brute force methods like sequential searching, as well as advanced techniques such as Knuth-Morris-Pratt and Boyer-Moore algorithms. Key concepts include left-to-right and right-to-left scanning, and strategies such as the bad character shift and good suffix shift rules. The text highlights sub-linear time methods that enhance search efficiency by examining fewer characters, ultimately leading to optimized retrieval performance.

clovis
Télécharger la présentation

Advanced Indexing and Searching Techniques in Modern Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modern Information Retrieval Chapter 8 Indexing and Searching

  2. Sequential searching • brute force approach

  3. a b a c a b a c • Knuth-Morris-Pratt approach • Left-to-right scan • Shifting rule a b a b a b a c a b ac a b ac a b a c

  4. Boyer-Moore approach • Right-to-left scan • Bad character shift rule • Good suffix shift rule • Sub-linear time method • Examines fewer than m+n characters

  5. Right-to-left scan • Shift one place when a mismatch occurs • O(nm) xpbctbxabpqx tpabxab

  6. Bad character rule • Right-most position in P of each character • R(T(k)) K  R(T(k))=R(y) y y x R(y) i y x R(y) < i, shift i-R(y) positions i-R(y)

  7. Bad character rule K  R(T(k))=R(y) y x i x y R(y) > i , Shift 1 positions x R(y) = 0, shift n-i+1 positions n-i+1

  8. The strong good suffix rule x t z t’ y t z t’ x t

  9. The strong good suffix rule x t y t y t y t

  10. Shift-Or approach An example of the shift-or algorithm for p=aab and s=abcaaab T a b c a 0 1 1 0 1 1 1 0 1 a b E S(E) T[a] E S(E) T[b] E S(E) T[c] E S(E) T[a] E E S(E) T[a] E S(E) T[a] E S(E) T[b] a a b 1 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1

More Related