1 / 37

E123 Presentation Title

E123 Presentation Title. Joe Celko Data Junction PDS 71062.1056@compuserve.com. Joe Celko - Articles. Member of ANSI X3H2 since 1987 SQL for Smarties - DBMS Magazine Celko on SQL - DBP&D SQL Puzzle - Boxes & Arrows DBMS/Report - Systems Integration WATCOM SQL Column - PBDJ

teo
Télécharger la présentation

E123 Presentation Title

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. E123Presentation Title • Joe Celko • Data Junction • PDS • 71062.1056@compuserve.com

  2. Joe Celko - Articles • Member of ANSI X3H2 since 1987 • SQL for Smarties - DBMS Magazine • Celko on SQL - DBP&D • SQL Puzzle - Boxes & Arrows • DBMS/Report - Systems Integration • WATCOM SQL Column - PBDJ • Celko on Software - COMPUTING(UK) • Celko - Intelligent Enterprise Magazine • SELECT FROM Austin - DB/M (Netherlands)

  3. Joe Celko - Books • JOE CELKO’S SQL FOR SMARTIES - 1995, 1999 Morgan-Kaufmann • INSTANT SQL - 1995, Wrox Press • JOE CELKO’S SQL PUZZLES & ANSWERS - 1997, Morgan-Kaufmann • DATA & DATABASES - 2000, Morgan-Kaufmann

  4. Trees in SQL • Trees are graph structures used to represent • Hierarchies • Parts explosions • Organizational charts • Three methods in SQL • Adjacency list model • Nested set model • Path enumeration

  5. Trees in SQL -2 • Trees are not hierarchies • Hierarchies have subordination • Kill your captain, you still have to take orders from your general • Break an edge in a tree, and you have two or more disjoint trees. • This means an adjacency list model is a tree, but not a hierarchy

  6. Tree Terminology

  7. Tree as Graph

  8. Trees as Nested Sets root A0 B0 A1 A2

  9. Graphs as Tables • Nodes and edges are not the same kind of things • Organizational chart & Personnel file • You should use separate tables for the structure and the elements • The structure table will be small (two integers and a key) • You can put more than one structure table on the same elements

  10. Adjacency List Model node parent cost =============== Root NULL 2.50 A0 Root 1.75 A1 A0 2.00 A2 A0 3.50 B0 Root 4.00 • Cost really should not be in the table, but most adjacency list tables mix nodes and edges (see Oracle’s sample database) • Most common method in use.

  11. Path Enumeration Model Tree node cost path ============== Root 2.50 ‘Root’ A0 1.75 ‘Root,A0’ A1 2.00 ‘Root,A0,A1’ A2 3.50 ‘Root,A0,A2’ B0 4.00 ‘Root,B0’ • Cost really should not be in the table, but most path enumeration tables mix nodes and edges. • Paths are search with path LIKE ‘Root,%’predicates

  12. Graph with Traversal

  13. Nested Sets with Numbers 1 2 3 4 5 6 7 8 9 10 A0 B0 Root A1 A2

  14. Nested Sets as Numbers • Split nodes and edges into two tables. • You can join them back together later • This could be personnel and Org chart • Tree.node would be job titles • Nodes would need job titles and the person holding it Tree Nodes node lft rgt node cost ============ ======= Root 1 10 Root 2.50 A0 2 7 A0 1.75 A1 3 4 A1 2.00 A2 5 6 A2 3.50 B0 8 9 B0 4.00

  15. Problems with Adjacency list • You have to use cursors or self-joins to traverse the tree • Cursors are not a table -- their order has meaning -- Closure violation! • Cursors take MUCH longer than queries • Ten level self-joins are worse than cursors

  16. Problems with Path Enumeration • Path can get long in a deep tree • Great for searching down the tree, but not up the tree • SELECT * FROM Tree WHERE path LIKE ‘Root,%’; • SELECT * FROM Tree WHERE path LIKE ‘%,B0’; • Inserting and deleting nodes is complicated • Requires string manipulation to change all the paths beneath the insertion or deletion point

  17. Tree Aggregates • Give me the total cost for all subtrees • (root, 13.75) -- sum of every node in tree • (A0, 7.25) -- sum of “A0” subtree • (A1, 2.00) • (A2, 3.50) • Dropping A2 would reduce all superior rows by 3.50,but would not change A1

  18. Find Root of Tree • SELECT * FROM Tree WHERE lft = 1; • It helps to have an index the lft column • The rgt value will be twice the number of nodes in the tree. • General rule: The number of nodes in any subtree ((rgt -lft) + 1 )/ 2

  19. Find All Leaf Nodes • SELECT * FROM Tree WHERE lft = rgt -1; • An index on lft will help • A covering index on (lft, rgt) is even better

  20. Find Superiors of X • SELECT Super.* FROM Tree AS T1, Tree AS Sup WHERE T1.node = ‘X’ AND T1.lft BETWEEN Sup.lft AND Sup.rgt; • This is the most important trick in this method • The BETWEEN predicates preserve subordination in the hierarchy • One query for any depth tree

  21. Find Subordinates of X • SELECT Sub.* FROM Tree AS T1, Tree AS Sub WHERE T1.node = ‘X’ AND Sub.lft BETWEEN T1.lft ANDT1.rgt; • This is the same pattern as finding superiors

  22. Find Depth of Tree • SELECT T1.node, COUNT(T2.node) AS level FROM Tree AS T1, Tree AS T2 WHERE T1.lft BETWEEN T2.lft AND T2.rgt GROUP BY T1.node; • Count the containing nested sets for levels • The closer to the root a node is, the greater the value of (rgt - lft)

  23. Totals by Level in Tree • SELECT T1.node, SUM(T2.cost) AS total_level_cost FROM Tree AS T1, Tree AS T2 WHERE T2.lft BETWEEN T1.lft AND T1.rgt GROUP BY T1.node; • Uses any aggregate function the same way

  24. Delete a Subtree • Remove subtree rooted at :my_node • DELETE FROM Tree WHERE lft BETWEEN (SELECT T1.lft FROM Tree AS T1 WHERE T1.node = :my_node) AND (SELECT T2.rgt FROM Tree AS T2 WHERE T2.node = :my_node)

  25. Delete a Single Node • Method one - promote a child to the parent’s prior position in the tree. Oldest son inherits family business • Method two- subordinate the entire subtree to the grandparent. Orphans go live with grandmother.

  26. Delete & Promote Oldest - 1 • Delete A0 node

  27. Delete & Promote Oldest - 2

  28. Delete & Promote Subtree - 1 • Delete A0 node

  29. Delete & Promote Subtree - 2

  30. Closing gaps in nested set model • Deleted nodes leave gaps in numbering of lft and rgt nodes. • Fill in gaps by sliding everyone over to the lft until there are no gaps. • First build a table of all lft and rgt values. LftRgt( seq) • UPDATE Tree SET lft = (SELECT COUNT(*) FROM LftRgt WHERE LftRgt.seq < =Tree.lft), rgt = (SELECT COUNT(*) FROM LftRgt WHERE LftRgt.seq < =Tree.rgt);

  31. Converting Nested Sets to Adjacency • SELECT B.emp AS boss, P.emp • FROM OrgChart AS P • LEFT OUTER JOIN • OrgChart AS B • ON B.lft • = (SELECT MAX(lft) • FROM OrgChart AS S • WHERE P.lft > S.lft • AND P.lft < S.rgt);

  32. Inserting into a Tree • The real trick is numbering the subtree correctly before inserting it. • Basic idea is to spread the nested set numbers apart to make a gap, the size of the subtree then you add the subtree. • The position of the subtree within the siblings of the new parent in the tree is another decision.

  33. Inserting into a Tree • insert a new node, G1, under part G. We can insert one node at a time like this: • BEGIN ATOMIC • DECLARE right_most_sibling INTEGER; • SET right_most_sibling • = (SELECT rgt FROM Frammis WHERE part = 'G'); • UPDATE Frammis • SET lft = CASE WHEN lft > right_most_sibling • THEN lft + 2 ELSE lft END, • rgt = CASE WHEN rgt >= right_most_sibling • THEN rgt + 2 ELSE rgt END • WHERE rgt >= right_most_sibling; • INSERT INTO Frammis (part, qty, wgt, lft, rgt) VALUES ('G1', 3, 4, parent, (parent + 1)); • END;

  34. Creating a Tree • If you want to have all the constraints for a proper hierarchy, then it is complicated. • CREATE TABLE Tree (node_id INTEGER NOT NULL REFERENCES Nodes(node_id), lft INTEGER NOT NULL UNIQUE CHECK(lft > 0), rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1), UNIQUE (lft, rgt), CHECK (lft < rgt), CHECK( EXISTS (SELECT * FROM Tree AS T1 HAVING MAX(rgt) = 2 * COUNT(*)) CHECK (NOT EXISTS (SELECT * FROM Tree AS T1 WHERE Tree.lft BETWEEN T1.lft AND T1.rgt ))); • You can also declare node_id to be the PRIMARY KEY, but then one person cannot hold two jobs.

  35. Converting an Adjacency Model into a Nested Set Model • Current best method is to load nodes into a tree in a host language, then do a recursive pre-order tree traversal to get the lft and rgt traversal numbers. • Adjacency list method does not order siblings; nested set model does automatically • Classic push down stack algorithm works • You can keep both models in one table with a column for the immediate superior

  36. Structure versus Contents • Nested set model allows the structure of trees to be compared. • For each tree find the lft value of the root node of each tree • Make a canonical form and UNION ALL them • EXISTS ( SELECT * FROM ( SELECT (lft - lftmost), (rgt - lftmost) FROM Tree1 UNION ALL SELECT (lft - lftmost), (rgt - lftmost) FROM Tree2) AS Both (lft, rgt) GROUP BY Both.lft, Both.rgt HAVING COUNT (*) =1 )

  37. Questions & Answers ?

More Related