1 / 66

EXCHANGING INTENSIONAL XML DATA

EXCHANGING INTENSIONAL XML DATA. Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd Amann Cedric-CNAM ; Omar Benjelloun INRIA ; Fred Dang Ngoc INRIA. H. GÜL ÇALIKLI 2002700743 MURAT KORAŞ 2002700797. INTRODUCTION.

joie
Télécharger la présentation

EXCHANGING INTENSIONAL XML DATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EXCHANGING INTENSIONAL XML DATA Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd AmannCedric-CNAM ; Omar Benjelloun INRIA ; Fred Dang NgocINRIA H. GÜL ÇALIKLI 2002700743 MURAT KORAŞ 2002700797

  2. INTRODUCTION • Emergence of Web Services as standard means of publishing and accessing data on the web introduced a new class of XML documents called “intensional documents”. • Intensional Documents:XMLdocuments where; • some of the documents are defined explicitly • some are defined by programs that generate data.

  3. INTRODUCTION • materialisation: the process of evaluating some of the programs included in an XML document and replacing them by their results. • GOAL of this PAPER: • Study the new issues raised by the exchange of intensional XML document btw. Applications • Decide on which data should be materialised before it is sent and which should not

  4. INTRODUCTION CONSIDERATIONS for MATERIALISATION • Performance: • current system load • cost of communication • Capabilities: • unability to handle intensional parts of a document • lack of access rights (to a particular service) • Security: • invoking service calls from an untrusted party may cause severe security violations • Functionalities: • confidentiality reasons • calling services may involve fees to be paid.

  5. Sender Receiver capabilities ACL cost ... capabilities ACL cost ... g g r g f q r g g r q INTRODUCTION Data exchange scenario for intensional documents g Data Exchange Schema q f f q g r q ... ... ... ... ...

  6. THE MODEL and THE PROBLEM • SIMPLE INTENSIONAL XML: • Model intentional XML documents as Labelled Trees consisting of two types of nodes: • Data nodes • Function Nodes correspond to “Service Calls” • Assume the existance of someDisjoint Domains: • N :domain of NODES • L :domain of LABELS • F : domain of FUNCTION NAMES • D : domain of DATA VALUES

  7. THE MODEL and THE PROBLEM • SIMPLE INTENSIONAL XML (cont’d) • DEFINITION 1: An intensional documentdis an expression (T,λ) where: • T=(N,E,<) is an ordered tree. • N N: finite set of nodes • E N X N : edges • < : associates with each node in N a total order on its children. • λ :N  L U F U D is a labeling function for the nodes. NOTE: only leaf nodes may be assigned data values from D

  8. THE MODEL and THE PROBLEM • SIMPLE INTENSIONAL XML (cont’d) • Nodes with a label in L U D are called Data Nodes. • Nodes with a label in F are called Function Nodes. • The children subtrees of a function node are the Function Parameters • When the function is called; • These subtrees are passed to it • The return value replaces the function node in the document.

  9. THE MODEL and THE PROBLEM newspaper Get_Temp TimeOut title temp date city “Exhibits” “The Sun” “16 ºC” “04/10/2002” “Paris”

  10. THE MODEL and THE PROBLEM • SIMPLE SCHEMA: • DEFINITION 2: A document schema s is anexpression (L,F,τ) where, • L L :finite set of labels • F F :finite set of function names • τ :function that maps: • Each label name l Є L to a regular expression over L U F or to the keyword data • Each function name f Є F to a pair of expressions called • τin(f ) input type of f • τout(f ) output type of f

  11. THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d) • Example of a Schema: • data: • τ (newspaper) =title.date.(Get_Temp|temp) .(TimeOut|exhibit) • τ (title) = data • τ (date) = data • τ (temp) = data • τ (city) = data • τ (exhibit) = data

  12. THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d) • Example of a Schema (cont’d): • functions: • τin (Get_Temp)= city • τout (Get_Temp)= temp • τin (TimeOut)= data • τout (Timeout)= (exhibit|performance) • τin (Get_Date)= title • τin (Get_Date)= date

  13. THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 3: An intensional document t is instance of a schema s=(L,F,τ) if for each: • Data NodenЄ t with label lЄ L, the labels of n’s children form a word in lang(τ(l )) • Same is valid for Function Node. Used to denode the regular language defined by τ (l )

  14. THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 3 (cont’d): f : a function name t1,......,tn : a sequence of intensional trees IFthe labels of n’s children form a word in lang(τin(f)) (lang(τout(f)) ) AND all the trees are instances of s. THEN t1,......,tnis an input instance of f (output instance) every subtree conforms to the same schema as the whole document

  15. THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 4: (about Rewritings) • t,t’: trees • IFt’ is obtained from t by; • selecting a function node v in t with some label fand • replacing it by an arbitrary output instance of f • THENwe say thatt t’ v

  16. THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 4: (about Rewritings) (cont’d) • IFt t1 t2 ------ tn THEN • we say that t tn • nodes v1,........, vn are called rewriting sequence • the set of all trees t’ such that t t’ is denoted ext(t). v1 v2 vn t rewrites into tn * *

  17. THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 5: (about Rewritings) • Let: • t be a tree • s be a schema • 1. IF ext(t) contains some instance of s THEN t possibly rewrites into s. • 2. IFeither t is already an instance of s orthere exists some node vin t such that all trees t’ where t t’ safely rewrite into s THEN we say that t safely rewrites into s v

  18. THE MODEL and THE PROBLEM • SIMPLE SCHEMA (cont’d): • DEFINITION 6: • Let: • s be a schema • r is a distinguished label called root label • IF all the instances t of s with root label r rewrite safely into instances of s’ THENwe say that: s safely rewritesinto s’

  19. THE MODEL and THE PROBLEM • A Richer Data Model : Function Patterns: • The schemas we have seen so far specify that a particular function, identified by its name, may appear in the document. • But sometimes, one does not know in advance which functions will be used at a given place. • A common intensional schema for such documents should not require the use of a particular function, but rather allow for a set of functions, which have a proper signature.

  20. THE MODEL and THE PROBLEM • to specify such set of functions we useFunction Patterns • Function Patterns:A function belongs to the pattern if its name satisfies theboolean predicateand itssignatureis the same as the required one • EX: • τname(Forecast)= UDDIF InACL • τin(Forecast)= city • τout(Forecast)= temp V

  21. THE MODEL and THE PROBLEM • A Richer Data Model (cont’d): • Restricted Service Invocations: • We assumed so far that all the functions appearing in a document may be invoked in a rewriting, in order to match a given schema. • This is not always the case, for the reasons like; • security, • cost, • access rights , etc. • THUS, function names/patterns in the schema can be partitioned into two disjoint groups of invocable and noninvocable ones. • A legal rewriting is then one that invokes only invocable functions.

  22. EXCHANGING INTENSIONAL DATA • Rewriting Process: 1.Safe Writing: • check if t safely rewrites to s • if so, find a rewriting sequence. • rewriting sequence a sequence of functions that need to be invoked to transformtinto the required structure • preferred required structure  shortest/ cheapest one

  23. EXCHANGING INTENSIONAL DATA • Rewriting Process(cont’d): 2.Possible Writing : • IF a safe rewriting does not exist • check whether at least t may rewrite to s. • IF it is acceptable to do so (the sender accepts that the rewriting may fail), • try to find a successful rewriting sequence if one exists • preferred rewriting sequence  one with the least cost.

  24. EXCHANGING INTENSIONAL DATA • Rewriting Process(cont’d): 3.Mixed Approached: In mixed approach, one could • first invoke some function calls • then attempt from there to find safe rewritings.

  25. EXCHANGING INTENSIONAL DATA • Rewriting Process(cont’d): • DEFINITION 7: • For a rewriting sequencetv:t1 ..tn , • IFV j ЄtibutV jЄti-1 . • THEN we say thatfunction nodeVjdepends on afunction nodeV i. • IF the dependency graph among the nodes contains no paths of length greater than k. • THEN we say that a rewriting sequence is ofdepth k v1 vn

  26. EXCHANGING INTENSIONAL DATA RESTRICTION: “Consider onlyk-depth left-to-right rewritings.“

  27. SAFE REWRITING • Algorithm for k-depth left to right safe rewriting • Algorithm is decomposed into three parts: • 1.Rewriting Function Parameters: • to invoke a function • its parameters should be of right type • if not • they should be rewritten to fit that type. • when rewriting the parameters; • the functions in them can be invoked ONLY IF their own parameters can be rewritten into (i.e. are the expected input type.)

  28. SAFE REWRITING • Algorithm is decomposed into three parts (cont’d) • 1.Rewriting Function Parameters (cont’d) • For deepest functions • Verify that their parameters are instances of the corresponding input types. • If notrewriting fails. • Move upward ( do till all functions in the tree(forest) are done) • Try to safely rewrite f ’s own parameters into the required structure. • If notrewriting fails.

  29. SAFE REWRITING • Algorithm is decomposed into three parts (cont’d) • 2.Top Down Traversal: • In each iteration of the recursive procedure “Rewriting Function Parameters”,the parameters of the outmost functions of tree (forest) are handled. • In this part  safely rewrite the tree (forest)by invoking only these outmost functions. • THUS: • traverse the tree (forest) top down • At each step treat a single node and its children.

  30. SAFE REWRITING • Algorithm is decomposed into three parts (cont’d) • 2.Top Down Traversal (cont’d) • node n with children whose labels form a word w • The subtree rooted at node n can be rewritten into the target schema s=(L,F,τ)IF and ONLY IF: • 1. wcan be safely rewritten into a word in lang(τ(label(n))) AND • 2. each of n’s children can be safely rewritten into an instance of s.

  31. SAFE REWRITING • Algorithm is decomposed into three parts (cont’d) • 3.Rewriting the children of a node n: • Given: • wword (sequence of labels of n’s children) • Goal: • rewrite w so that it becomes a word in the regular language R=τ(label(n)) • The process of rewriting involves: • choosing some functions in wand replacing them by a possible output • then choosing some other functions (which might have been returned by previous calls) and replacing them by their output • and so on up to the depth k

  32. SAFE REWRITING • Safe Rewriting Algorithm: • Given: • word w • the output types Rf1,.....,Rfnof the available functions • target regular language R • Purpose of the algorithm: • to test ifwcan be safely rewritten into a word in R • if so, to find a safe rewriting sequence

  33. SAFE REWRITING • Safe Rewriting Algorithm: • Note:For illustration purposes we use the newspaper document • w=title.date.Get_Temp.TimeOut word children labels form • R=title.date.temp (TimeOut|exhibit*)safe rewriting of the above word into the word in R • The Algorithm: • 1) Build the finite state automata for the following regular languages • 1.1) An AutomatonAwaccepting was a single word.

  34. SAFE REWRITING • The Algorithm (cont’d) • 1.2) Build automata Afi ,i=1,...,n each accepting the regular language Rfi • 1.3) Build an automaton A accepting the complement of the regular language R . The automaton should be deterministic and complete.

  35. SAFE REWRITING • The complement automation A for schema τ’(newspaper)=title.temp(TimeOut|exhibit*) * * * * p0 title p1 date temp p3 TimeOut p4 p6 p3 * exhibit * p5 exhibit

  36. SAFE REWRITING • The Algorithm (cont’d) • 2)Let Aw := Aw • 3) For j=1,...,k • Consider all the edgese=(v,u) in Awthat are labelled by the function name fi and not iterated in previous iterations • 3.1) extend Aw by attaching a copy of the automaton Afi with its inital and final states linked to v and u respectively by εmoves. • 3.2) denote v as a fork node (for the edge e) • 3.3) two fork options of v aree itself and the new outgoing ε edge k k k

  37. Get_Temp q2 title date q0 q1 q3 TimeOut q4 ε ε ε ε temp q5 q6 q7 exhibit performance SAFE REWRITING 1 • 1 depth automaton Aw for the word w=title.date.Get_Temp.TimeOut Represents choice of not invoking the function Fork node Fork node Represents choice of invoking the function

  38. SAFE REWRITING • The Algorithm (cont’d) • 4) Construct the cartesian product automaton AX=Aw X A • The fork nodes and fork options in AX reflect those of Aw : • 4.1) the fork nodes [q p] Є AX  nodes where q was a fork node in Aw • 4.2) a fork option in AX consists of all edges originating from one fork option edge in Aw. k k k k

  39. SAFE REWRITING • The cartesian product automaton Ax = Aw x A exhibit q4,p6 q5,p5 Performance q7,p5 ε ε ε Performance Exhibit exhibit TimeOut Perform. ε ε q7,p6 q3,p6 q7,p6 q7,p3 q4,p3 ε Get_Temp title date TimeOut q0,p0 q1,p1 q2,p2 q4,p4 q3,p3 ε ε temp q5,p2 q6,p3 Figure6:

  40. SAFE REWRITING • The Algorithm (cont’d): • 5) Mark nodes in AX: • 5.1) mark states that are accepting states in both Aw and A • 5.2) iteratively mark; • nonfork (regular) nodes: IF one of their outgoing edges points to a marked node • fork nodes: IF both of their fork options (for some fi ) contain an edge that points to a marked node. k

  41. SAFE REWRITING • The cartesian product automaton Ax = Aw x A exhibit q4,p6 q5,p5 Performance q7,p5 ε ε ε Performance Exhibit exhibit TimeOut Perform. ε ε q7,p6 q3,p6 q7,p6 q7,p3 q4,p3 ε Get_Temp title date TimeOut q0,p0 q1,p1 q2,p2 q4,p4 q3,p3 ε ε temp q5,p2 q6,p3 Figure6:

  42. SAFE REWRITING • The Algorithm (cont’d): • 6)Try to obtain a SAFE REWRITING. • “A safe rewriting exists IFF the initial state is not marked” • 6.1) Follow a non-marked path(corresponding tow ) starting from the initial state ofAx to a state [q p] where q is an accepting stateofAw • 6.1.1) non-marked fork options on the path determine the rewriring choices (i.e. which functions to call) • 6.1.2)when a function is invoked, we cont,nue the path with the new rewritten word rather than the wordw k

  43. SAFE REWRITING • The Algorithm (cont’d): • 6.2) To minimize the rewriting cost, choose a path with minimal number/cost of function invocations. • EXIT % End of the algorithm

  44. SAFE REWRITING • The complement automaton A for schema τ’(newspaper)=title.date.temp.exhibit* 1 * * * * * q0 title q1 date temp p3 p4 p6 q3 * exhibit * p5 exhibit Figure7:

  45. SAFE REWRITING 1 1 • The cartesian product automatonAx = Aw x A 1 exhibit q4,p6 q5,p5 Performance q7,p5 ε ε ε Performance Exhibit exhibit TimeOut Perform. ε ε q7,p6 q3,p6 q7,p6 q7,p3 q4,p3 ε Get_Temp TimeOut title date q0,p0 q1,p1 q2,p2 q3,p3 ε ε temp q5,p2 q6,p3 Figure8:

  46. SAFE REWRITING • Complexity of the Algorithm: • s0 schema of the sender • s agreed data exchange schema • Complexity is determined by the size of thecartesian product of the automaton. • 1. Construct the cartesian product • 2. Traverse and mark the nodes of the resulting product • THUS complexity is bounded by: • O(|Ax| )=O( ( | Aw | X | A |) ) 2 2 k

  47. SAFE REWRITING • Complexity of the Algorithm: (cont’d) • O(|Ax| )=O( ( | Aw | X |A |) ) 2 2 k Maximum size: O((|s0|+|w|) ) Complexity is polynomial in the size of schemas s and s0 (with the exponent determined by k) k

  48. POSSIBLE REWRITING • The Algorithm • 1.Build finite state automaton for the following languages: • 1.1. An automaton Aw • 1.2. An automaton accepting the regular language R k

  49. POSSIBLE REWRITING • An automaton A for schema τ’’(newspaper)=title.date. Temp.exhibit* p0 title p1 date temp p3 Exhibit p4 p2 exhibit Figure10:

  50. POSSIBLE REWRITING • The Algorithm (cont’d) • 2.Construct the cartesian product automaton Ax=Aw x A k q4,p3 ε ε title date q0,p0 q1,p1 q2,p2 q7,p3 q3,p3 ε ε q7,p4 temp q5,p2 q6,p3 ε q4,p4 exhibit Figure11:

More Related