1 / 15

Web Data Management

Web Data Management. Path Expressions. In this lecture. Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1. Path Expressions. Examples: Bib . paper Bib . book . publisher Bib . paper . author . lastname

matteo
Télécharger la présentation

Web Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Data Management Path Expressions

  2. In this lecture • Path expressions • Regular path expressions • Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1

  3. Path Expressions Examples: • Bib.paper • Bib.book.publisher • Bib.paper.author.lastname Given an OEM instance, the answer of a path expression p is a set of objects

  4. Path Expressions Bib &o1 Examples: DB = paper paper book references &o12 &o24 &o29 references references author page author year author title http title title publisher author author author &o43 &25 &o44 &o45 &o46 &o52 &96 1997 &o51 &o50 &o49 &o47 &o48 last firstname firstname lastname first lastname &o70 &o71 &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu” Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}

  5. Answer of a Path Expression Answer(P, DB) = f(P, root(DB)) Where: f(e, x) = {x} f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)} Simple evaluation algorithms for Answer(P,DB): Runs in PTIME in size(P), size(db): • PTIME complexity

  6. Regular Path Expressions R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: • Bib.(paper|book).author • Bib.book.author.lastname? • Bib.book.(references)*.author • Bib.(_)*.zip

  7. Applications of Regular Path Expressions • Navigating uncertain structure: • Bib.book.author.lastname? • Syntactic substitution for inheritance: • Bib.(paper|book).author • Better: Bib.publication.author, but we don’t have inheritance

  8. Applications of Regular Path Expressions • Computing transitive closure: • Bib.(_)*.zip = everything accessible • Bib.book.(references)*.author = everything accessible via references • Some regular expressions of doubtful practical use: • (references.references)* = a path with an even number of references • (_._)* = paths of even length • (_._._.(_)?)* = paths of length (3m + 4n) for some m,n • But make great examples for illustration 

  9. Answer of a Regular Path Expression Recall: • Lang(R) = the set of words P generated by R Answer of regular path expressions: • Answer(R,DB) = {Answer(P,DB) | P  Lang(R)} Need an evaluation algorithm that copes with cycles

  10. Regular Path Expressions Recall: each regular expression  NDFA Example: R = (a.a)*.a.b A = a states(A) = {s1,s2,s3,s4} initial(A) = s1 terminal(A) = {s4} s1 s2 a a b s3 s4

  11. Regular Path Expressions Canonical Evaluation Algorithm • Answer(R,DB): • construct A from R • construct product automaton G = A x DB: • nodes(G) = states(A) x nodes(db) • edges(G) = {((s,x),L,(s’,x’) | (s,L,s’)  edges(A), (x,L,x’)  edges(DB)} • root(G) = (initial(A), root(DB)) • compute Gacc = set of nodes accessible from root(G) • return {x | s  terminal(A) s.t. (s,x)  Gacc}

  12. _ &o1 s1 s2 s3 a a _ &o2 a a &o3 &o4 b Regular Path Expressions Example: R = _.(_._)*.a A = DB = Answer of R on DB = { &o2, &o3}

  13. Compute Product Automaton G _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b

  14. Compute Accessible Part Gacc _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b Answer(R,DB) = {&o2, &o3}

  15. Complexity of Regular Path Expressions • The evaluation algorithm runs in PTIME in size(R), size(DB) • Even when there are cycles in DB

More Related