1 / 37

CSE 373 Data Structures and Algorithms

CSE 373 Data Structures and Algorithms. Lecture 25: B-Trees. Assumptions of Algorithm Analysis. Big-Oh assumes all operations take the same amount of time Is this really true?. Disk Based Data Structures. All data structures we have examined are limited to main memory

schoenfeld
Télécharger la présentation

CSE 373 Data Structures and Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 373Data Structures and Algorithms Lecture 25: B-Trees

  2. Assumptions of Algorithm Analysis • Big-Oh assumes all operations take the same amount of time • Is this really true?

  3. Disk Based Data Structures • All data structures we have examined are limited to main memory • Have been assuming data fits into main memory • Counter-example: Transaction data of a bank > 1 GB per day • Uses secondary storage (e.g., hard disks) • Operations: insert, delete, searches

  4. Cycles* to access: CPU Registers 1 Cache tens Main memory hundreds Disk millions *cycle: time it takes to execute an instruction

  5. Hard Disks • Large amount of storage but slow access • Identifying a particular block takes a long time • Read or write data in chunks (“a page”) of 0.5 – 8 KB in size • (Newer technology) Solid-state drives are 50 – 100 times faster • Still “slow”

  6. Algorithm Analysis • Running time of disk-based data structures measured in terms of • Computing time (CPU) • Number of disk accesses • Regular main-memory algorithms that work one data element at a time can not be "ported" to secondary storage in a straight forward way

  7. Principles • Almost all of our data structure is on disk. • Every time we access a node in the tree it amounts to a random disk access. • How can we address this problem?

  8. M-ary Search Tree • Suppose we devised a search tree with branching factor M: • M – 1 keys needed to decide branch to take • Complete tree has height: • # Nodes accessed for search: (logMn) (logMn)

  9. 12 21 3 7 B-Trees • Internal nodes store (up to) M  1 keys • Order property: • Subtree between two keys x and y contain leaves with valuesv such that xv < y • Note the “” • Leaf nodes containup to L sorted values/data items (“records”). M = 7 x<3 7x<12 12x<21 21x 3x<7

  10. B-Tree Structure Properties • Root (special case) • Has between 2 and M children (or could be a leaf) • Internal nodes • Stores up to M1 keys • Have between ceiling(M/2) and M children • Leaf nodes • Where data is stored • Contains between ceiling(L/2) and L data items Nodes are at least ½ full Leaves are at least ½ full The tree is perfectly balanced !

  11. 14 19 34 27 20 50 12 16 17 44 10 12 2, GH.. 41 1, AB.. 4, XY.. 6 8 9 20 22 27 28 44 38 24 34 60 6 47 70 39 50 49 32 B-Tree: Example • B-Tree with M = 4 (# pointers in internal node) and L = 5 (# data items in leaf) Data objects… which we’ll ignore in slides All leaves at the same depth Definition for later: “neighbor” is the next sibling to the left or right.

  12. Disk Friendliness • Many keys stored in a node • Each node is one disk page/block. • All brought to memory/cache in one disk access. • Internal nodes contain only keys; only leaf nodes contain actual data • What is limiting you from increasing the number of keys stored in each node? • The page size

  13. Exercise: Computing M and L • Exercise: If disk block is 4000 bytes, key size is 20 bytes, pointer size is 4 bytes, and data/value size is 200 bytes, what should M (# of branches) and L (# of data items per leaf) be for our B-Tree? Solve for M: M - 1 keys + M pointers = 20M - 20 + 4M = 24M – 20 24M - 20 <= 4000 M = 167 Solve for L: L = 4000 / 200 L = 20

  14. B-trees vs. AVL trees Suppose we have n = 109 (billion) data items: • Depth of AVL Tree: log2 109 = 30 • Depth of B-Tree with M = 256, L = 256: ~ log M / 2109 = log 128109 = 4.3 (M/2 because keys half-full) 14

  15. Building a B-Tree with Insertions 3 3 3 Insert(3) Insert(18) Insert(14) 14 18 18 The empty B-Tree M = 3 L = 3

  16. 3 3 3 18 Insert(30) 14 14 14 30 18 18 30 M = 3 L = 3

  17. 18 18 18 32 3 18 3 18 3 18 32 Insert(32) Insert(36) 36 14 30 14 30 14 30 32 18 32 Insert(15) 3 18 32 14 30 36 M = 3 L = 3 15

  18. 18 32 18 32 3 18 32 3 18 32 Insert(16) 14 30 36 14 30 36 15 15 16 18 32 15 18 3 15 18 32 15 32 14 16 30 36

  19. M = 3 L = 3 Insert(12,40,45,38) 18 18 15 32 15 32 40 3 15 18 32 40 3 15 18 32 14 16 30 36 45 12 16 30 36 14 38

  20. Insertion Algorithm: The Overflow Step Too big M = 5

  21. Insertion Algorithm • Insert the key in its leaf in sorted order • If the leaf ends up with L+1 items, overflow! • Split the leaf into two nodes with each half of the data. • Add the new leaf to the parent. • If the parent ends up with M+1 children, overflow!

  22. Insertion Algorithm 3. If an internal node ends up with M+1 children, overflow! • Split the internal node into two nodes each with half the keys. • Add the new child to the parent • If the parent ends up with M+1 items, overflow! • If the root ends up with M+1 children, split it in two, and create new root with two children This makes the tree deeper!

  23. And Now for Deletion… 18 18 Delete(32) 15 40 15 32 40 40 40 3 15 18 3 15 18 32 36 45 45 12 16 30 12 16 30 36 38 14 14 38 M = 3 L = 3

  24. 18 18 Delete(15) 15 36 40 16 36 40 40 3 15 18 36 40 3 16 18 36 45 12 16 30 38 45 12 30 38 14 14 Are we okay? Are you using that 14?Can I borrow it? M = 3 L = 3 Leaf not half full!

  25. 18 18 16 36 40 14 36 40 40 40 3 16 18 36 3 14 18 36 45 45 12 30 38 12 16 30 38 14 M = 3 L = 3

  26. 18 Delete(16) 18 14 36 40 14 36 40 40 3 14 18 36 3 14 40 18 36 45 12 16 30 38 12 45 30 38 M = 3 L = 3 Are you using that 14?

  27. 18 18 36 40 14 36 40 40 3 18 36 3 14 40 18 36 45 12 30 38 12 45 30 38 14 M = 3 L = 3 Are you using the node18/30?

  28. 18 36 36 40 18 40 40 40 3 18 36 3 18 36 45 45 12 30 38 12 30 38 14 14 M = 3 L = 3

  29. Delete(14) 36 36 18 40 18 40 40 3 18 36 40 3 18 36 45 12 30 38 45 12 30 38 14 M = 3 L = 3

  30. Delete(18) 36 36 18 40 18 40 40 3 30 36 40 3 18 36 45 12 38 45 12 30 38 M = 3 L = 3

  31. 36 36 18 40 40 40 40 3 30 36 3 36 45 45 12 38 12 38 30 M = 3 L = 3

  32. 36 40 36 40 40 40 36 3 36 3 45 45 38 12 38 12 30 30 M = 3 L = 3

  33. 36 40 36 40 40 36 3 45 38 12 40 36 3 30 45 38 12 30 M = 3 L = 3

  34. Deletion Algorithm: Rotation Step Too small This is left rotation. Similarly, right rotation M = 5

  35. Deletion Algorithm: Merging Step Too small! Can’t getsmaller Too small M = 5

  36. Deletion Algorithm • Remove the key from its leaf • If the leaf ends up with fewer than L/2 items, underflow! • Try a left rotation • If not, try a right rotation • If not, merge, then check the parent node for underflow

  37. Deletion Slide Two • If an internal node ends up with fewer than M/2 children, underflow! • Try a left rotation • If not, try a right rotation • If not, merge, then check the parent node for underflow • If the root ends up with only one child, make the child the new root of the tree This reduces the height of the tree!

More Related