1 / 24

Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees

Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees. Jérémy Barbay, Meng He , J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen. Background: Succinct Data Structures. What are succinct data structures Jacobson 1989

melora
Télécharger la présentation

Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen

  2. Background: Succinct Data Structures • What are succinct data structures • Jacobson 1989 • Why succinct data structures • Large data sets in modern applications: textual, genomic, spatial or geometric • An implementation: Delpratt et al. 2006 • Succinct integrated encodings • Main data and auxiliary data structures

  3. Our Problem: Succinct Indexes • Use of the concept in previous work • Compact PAT trees: Clark & Munro 1996 • Lower bounds: Demaine & López-Ortiz 2001; Miltersen 2005 • Upper bounds: Sadakane & Grossi 2006 • Definition of succinct indexes in data structure design • ADT: primitive access operators • Succinct index: more powerful operators

  4. Succinct Integrated Encodings X + Auxiliary Data Structures Navigational Operations Main Data

  5. Succinct Indexes + Navigational Operations Main Data Succinct Index

  6. Succinct Indexes vs. Integrated Encodings • Maximizing the freedom of the encoding of the main data • Allowing incremental design • Supporting implicit data

  7. Strings: Definitions • Notation • Alphabet: [σ]={1, 2, …, σ} • String: S[1..n] • Operations: • string_access(x): S[x] • string_rank(α, x): number of occurrences of α in S[1..x] • string_select(α, r): position of the rth occurrence of α in S

  8. Strings: An Example S = a a b a c c c d a d d a b b b c string_access(8) = d string_rank(a, 8) = 3 string_select(b, 3) = 14

  9. Strings: Previous Results • Succinct Integrated Encodings • Wavelet trees: Grossi et al. 2003 • Space: nH0 + o(n)∙lgσ bits • Time: O(lgσ) time for all three operations • Golynski et al. 2006 • Space: n (lgσ + o(lgσ)) bits • Time: O(lglgσ) time for string_access and string_rank, O(1) time for string_select

  10. Strings: Our Results • Succinct Indexes • ADT • string_access: f(n, σ) time • Space: n∙o(lgσ) bits • Operations • string_rank: O(lglgσ lglglgσ (f(n, σ)+lglgσ)) • string_select: O(lglglgσ (f(n, σ)+lglgσ)) • Other operations: negations

  11. Binary Relations: Definitions • Notation • Binary relation: R ⊆ [n] x [σ] • Number of objects: n; number of labels: σ • Number of object-label pairs: t • Operations • object_access(x, r): rth label associated with x • label_access(x, α): whether x is associated with α • label_rank(α, x): number of objects labeled α up to object x • label_select(α, r): rth object labeled α

  12. Binary Relations: An Example n object_access(1, 2) = 4 0 1 0 1 00 0 0 1 01 0 1 1 01 1 0 0 1 label_access(2, 3) = false σ label_rank(3, 4) = 3 label_select(4, 3) = 5

  13. Binary Relations: Previous Results • Succinct Integrated Encodings • Barbay et al., 2006 • Space: t (lgσ + o(lgσ)) bits • Time: O(lglgσ) time for object_access, label_rank and label_access, O(1) time for label_select

  14. Binary Relations: Our Results • Succinct Indexes • ADT: • object_access: f(n,σ,t) • Space: • t∙o(lgσ) bits • Time: • label_rank and label_access: O(lglgσ lglglgσ (f(n,σ,t) + lglgσ)) • label_select: O(lglglgσ (f(n,σ,t) + lglgσ))

  15. Multi-labeled Trees: Definitions • Notation • Number of nodes: n • Number of labels: σ • Number of node-label pairs: t • Operations • α-descendant • α-child • α-ancestor

  16. Multi-labeled Trees: An Example Node 2 is a c-ancestorof node 6 {a, c, d} 1 {c, d} {a, c} 2 8 Node 6 is a b-descendantof node 2 {a} {a, b} {b,d} {c} {c,d} {b,c,d} 3 4 7 9 10 11 Node 10 is a d-childof node 8 5 6 {b} {a, b}

  17. Multi-labeled Trees: Previous Results • Labeled trees • Geary et al. 2004 • Ferragina et al. 2005 • Barbay et al. 2006 • Multi-labeled trees • Barbay et al. 2006

  18. Multi-labeled Trees: Our Approach • Traversal Orders • Preorder • DFUDS order • Ordinal Trees: DFUDS • Benoit et al. 1999 & 2005 • Jansson et al. 2007 • 2 Binary Relations • Nodes in preorder & labels • Nodes in DFUDS order & labels 1 3 2 8 4 5 6 3 4 7 9 10 11 7 8 5 6

  19. Multi-labeled Trees: Our Results • Succinct Indexes • ADT: node_label(x, r) • Supporting α-child/descendant queries: t∙o(lgσ) bits • Supporting α-child/descendant/ancestor queries: t∙(lgρ + o(lgρ) + o(lgσ))bits (ρ: recursivity) • Supporting α-child/descendant/ancestor queries of node x after another node y

  20. Applications • Compressed Succinct Encodings • Strings • Space: nHk + o(nlgσ) bits • Operations: • string_access: O(1) • String_rank: O((lglgσ)2lglglgσ) • string_select: O(lglgσ lglglgσ) • First high-order entropy-compressed encoding supporting rank/select efficiently • Other Data Structures

  21. Applications (Continued) • High-order entropy-compressed text indexes for large alphabets • Notations: n-text size, σ-alphabet size, m-pattern length, occ-number of occurrences • Our results • Space: nHk+o(nlgσ) bits • Pattern searching: O(mlglgσ+occ lg1+εnlglgσ) • Previous results: a lgσ factor instead of lglgσ or incompressible

  22. Conclusions • We showed the importance of succinct indexes in the design of succinct data structures by designing: • Succinct representation of multi-labeled trees that supports efficient retrieval of ancestors / children / descendants by label • First high-order entropy compressed representation of strings supporting rank/select • High-order entropy compressed text indexes for large alphabets

  23. Conclusions (Continued) The concept of succinct indexes is useful in designing succinct data structures … it maximizes the freedom of the encoding of the main data and leads to a rich choice of design tradeoffs.

  24. Thank you!

More Related