150 likes | 256 Vues
Explore path expressions and regular path expressions in web data management. Learn evaluation techniques and examples. Understand the applications and algorithms for managing data paths effectively. Resources from Data on the Web by Abiteboul, Buneman, and Suciu.
E N D
Web Data Management Path Expressions
In this lecture • Path expressions • Regular path expressions • Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1
Path Expressions Examples: • Bib.paper • Bib.book.publisher • Bib.paper.author.lastname Given an OEM instance, the answer of a path expression p is a set of objects
Path Expressions Bib &o1 Examples: DB = paper paper book references &o12 &o24 &o29 references references author page author year author title http title title publisher author author author &o43 &25 &o44 &o45 &o46 &o52 &96 1997 &o51 &o50 &o49 &o47 &o48 last firstname firstname lastname first lastname &o70 &o71 &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu” Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}
Answer of a Path Expression Answer(P, DB) = f(P, root(DB)) Where: f(e, x) = {x} f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)} Simple evaluation algorithms for Answer(P,DB): Runs in PTIME in size(P), size(db): • PTIME complexity
Regular Path Expressions R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: • Bib.(paper|book).author • Bib.book.author.lastname? • Bib.book.(references)*.author • Bib.(_)*.zip
Applications of Regular Path Expressions • Navigating uncertain structure: • Bib.book.author.lastname? • Syntactic substitution for inheritance: • Bib.(paper|book).author • Better: Bib.publication.author, but we don’t have inheritance
Applications of Regular Path Expressions • Computing transitive closure: • Bib.(_)*.zip = everything accessible • Bib.book.(references)*.author = everything accessible via references • Some regular expressions of doubtful practical use: • (references.references)* = a path with an even number of references • (_._)* = paths of even length • (_._._.(_)?)* = paths of length (3m + 4n) for some m,n • But make great examples for illustration
Answer of a Regular Path Expression Recall: • Lang(R) = the set of words P generated by R Answer of regular path expressions: • Answer(R,DB) = {Answer(P,DB) | P Lang(R)} Need an evaluation algorithm that copes with cycles
Regular Path Expressions Recall: each regular expression NDFA Example: R = (a.a)*.a.b A = a states(A) = {s1,s2,s3,s4} initial(A) = s1 terminal(A) = {s4} s1 s2 a a b s3 s4
Regular Path Expressions Canonical Evaluation Algorithm • Answer(R,DB): • construct A from R • construct product automaton G = A x DB: • nodes(G) = states(A) x nodes(db) • edges(G) = {((s,x),L,(s’,x’) | (s,L,s’) edges(A), (x,L,x’) edges(DB)} • root(G) = (initial(A), root(DB)) • compute Gacc = set of nodes accessible from root(G) • return {x | s terminal(A) s.t. (s,x) Gacc}
_ &o1 s1 s2 s3 a a _ &o2 a a &o3 &o4 b Regular Path Expressions Example: R = _.(_._)*.a A = DB = Answer of R on DB = { &o2, &o3}
Compute Product Automaton G _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b
Compute Accessible Part Gacc _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b Answer(R,DB) = {&o2, &o3}
Complexity of Regular Path Expressions • The evaluation algorithm runs in PTIME in size(R), size(DB) • Even when there are cycles in DB