1 / 63

Data Placement Problems in Database Applications

Data Placement Problems in Database Applications. An Zhu Stanford University. Data Placement. Data objects Multiple disks Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations. Outline. Multimedia Systems [GKKTZ 00]

Télécharger la présentation

Data Placement Problems in Database Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Placement Problems in Database Applications An Zhu Stanford University

  2. Data Placement • Data objects • Multiple disks • Assignment of objects to disks • Optimize performance • Optimize I/O • Handle dynamic situations AZ

  3. Outline • Multimedia Systems [GKKTZ 00] • Maximize the total clients served • Relational Database Layout [AFMPZ 03] • Minimize the combined I/O access time • Load Rebalancing Problem [AMZ 03] • Minimize the makespan within allowed moves AZ

  4. Outline • Multimedia Systems[GKKTZ 00] • Maximize the total clients served • Relational Database Layout [AFMPZ 03] • Minimize the combined I/O access time • Load Rebalancing Problem [AMZ 03] • Minimize the makespan within allowed moves AZ

  5. Multimedia Storage Systems • Movie objects • Clients/subscribers • Parallel disks • Limited storage: # of movies—Nj • Limited bandwidth: # of clients—Cj • Homogeneous system: Nj=k, Cj=L,  j • Uniform ratio: Cj/Nj=r,  j AZ

  6. An Example 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 AZ

  7. An Example 400/600 000/600 100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 AZ

  8. An Example 400/600 400/600 000/600 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 AZ

  9. Not All Clients Can be Satisfied 400/600 400/600 600/600 400 Total Satisfied Clients: 1400/1800=7/9 AZ

  10. Sliding Window Algorithm • Consider one disk at a time • Maintain an ordered list of movies • The first consecutive k movies (or less) with at least L combined clients • Assign the first L clients to the disk and reconsider leftover clients AZ

  11. An Example 100 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Max window size k=4 AZ

  12. An Example 200 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Max window size k=4 AZ

  13. An Example 400 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Max window size k=4 AZ

  14. An Example 400 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Max window size k=4 AZ

  15. An Example 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 700 Max window size k=4 AZ

  16. An Example 600/600 000/600 100 100 100 100 100 100 000/600 100 0 0 0 100 400 Max window size k=4 AZ

  17. An Example 600/600 000/600 100 100 100 100 100 100 000/600 100 100 400 Max window size k=4 AZ

  18. An Example 600/600 600/600 100 100 100 100 100 100 400 000/600 Max window size k=4 AZ

  19. An Example 600/600 600/600 100 100 400/600 Total Satisfied Clients: 1600/1800=8/9 AZ

  20. Theoretical Bounds • Satisfies at least fraction of total clients • In the worst case, no algorithm can satisfy more clients • Translates to an -approximation • PTAS: (1+)-approximation, >0 AZ

  21. Theoretical Bounds • Satisfies at least fraction of total clients • In the worst case, no algorithm can satisfy more clients • Translates to an -approximation • PTAS: (1+)-approximation, >0 AZ

  22. Proof Sketch • Load vs. storage saturated: ML, MS • Least loaded disk: cL • ML+MS=M, 0<c<1 • All remaining movies each have no more than cL/k clients • Initial instance is feasible (w.l.o.g.) AZ

  23. An Example 600/600 600/600 100 100 ML=2, MS=1, c=400/600 cL/k=100 400/600 Total Satisfied Clients: 1600/1800=8/9 AZ

  24. Proof Outline • If there is a load saturated disk with less than k movies • All clients are satisfied • Otherwise • At most ML movies are left • Satisfy at least fraction of the clients AZ

  25. Lemma  • If any of the load saturated disk has less than k objects • Any k-1 remaining movies in the list has L clients or more AZ

  26. Lemma  • The remaining disks are all load saturated • So, all clients are satisfied At least L At least L AZ

  27. Otherwise… • Each disk has exactly k movies • Total assigned movies: M·k • Initial movies: N  M·k • “New” movies generated:  ML • # of movies left: ≤ ML • # of clients/remaining movie: ≤ cL/k • Total # of remaining clients: cLML/k AZ

  28. Otherwise… • Total clients: ≤ M·L • Assigned clients:  ML·L + Ms·cL • Total # of remaining clients : ≤Ms·(1-c)L • Final bound: AZ

  29. Simulation Results M=5 L=100 N=M·k Zipf with =0.0 (  i-1 ) AZ

  30. Recap • The problem is NP-complete • PTAS: best possible approximation bound • : best possible absolute bound • Sliding window algorithm: practical with O((M+N)log(M+N)) running time AZ

  31. Outline • Multimedia Systems [GKKTZ 00] • Maximize the total clients served • Relational Database Layout[AFMPZ 03] • Minimize the total I/O access time • Load Rebalancing Problem [AMZ 03] • Minimize the makespan within allowed moves AZ

  32. Relational Databases • Objects: indexes, tables, views • Multiple disks • Minimize the total I/O access time AZ

  33. Past Work • Full striping • Split uniformly across all available disks • Utilize I/O parallelism • : transfer rate 200MB 200MB =0.05s/MB,Tt=10s AZ

  34. Past Work • Full striping • Split uniformly across all available disks • Utilize I/O parallelism • : transfer rate 200MB 50MB =0.05s/MB,Tt=10s =0.05s/MB,Tt=2.5s 50MB 50MB 50MB 50MB 50MB 50MB AZ

  35. Past Work • Co-accessed objects with Random I/O • Seek time/per block size: 0.01s/0.1MB • Seek rate:  =0.1s/MB • Smaller object dominates A Ts=50·2=10s 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB AZ

  36. Past Work • Combined access time • Transfer time: Tt=(50+100)·=7.5s • Seek time: Ts=min(50,100)·=10s • Combined time: Tt+Ts=17.5s A 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB AZ

  37. Past Work • Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya03’] • Combined time: 200·=10s 200MB 200MB 100MB 100MB AZ

  38. Data Layout Problem • Work Load (SQL DML) • A set of queries and/or updates • A set of co-accessed objects (pairwise) • Access stats (pairwise) • Minimize the estimated I/O access time AZ

  39. Theoretical Questions • Approximation and its hardness • Transfer time: P • Seek time: Very Hard • Combined time • Hard • Minimizing transfer time alone is a “good” approximation AZ

  40. Transfer Time • Heterogeneous disks • Different rate: j • Storage constraint: cj • Objects • Different size: si • Access frequency: i,i’ • Solvable using Linear Programming (LP) AZ

  41. LP Amount of object i assigned to disk j Each object must be completely assigned Each disk’s storage limit is kept Transfer time for (i,i’) on disk j Overall transfer time for (i,i’) Minimize the total transfer time AZ

  42. Seek Time • Hard even on disks with no storage constraint • Integral assignment • Each object is assigned to one machine only • Conversion from a fraction assignment with no loss AZ

  43. Conversion  • f( , )=1, f( , )=1, f( , )=0 • Total seek cost: 1002+1002 • Want: each file is spread uniformly across a subset of disks A B C B A C 100MB 150MB 200MB 200MB 100MB 100MB AZ

  44. Conversion  • f( , )=1, f( , )=1, f( , )=0 • Total seek cost: 1002+1002 • New cost: 1002+1252 A B C B A C 125MB 125MB 200MB 200MB 100MB 100MB AZ

  45. Conversion  • f( , )=1, f( , )=1, f( , )=0 • Total seek cost: 1002+1002 • New cost: 1002 A B C B A C 250MB 125MB 125MB 200MB 200MB 100MB 100MB AZ

  46. Conversion  • f( , )=1, f( , )=1, f( , )=0 • Total seek cost: 0 • Each file resides on only one disk A B C B A C 400MB 250MB 250MB 200MB 200MB 200MB 100MB 100MB AZ

  47. Implications • A polynomial time algorithm • Equivalent to Minimum Edge Deletion k-Partition • NP-Hard to approximate: O(n2) • Forces combined time be hard to approximate AZ

  48. Combined Time • Let • Hard to approximate: ·, 1>>0 • Optimize transfer time alone gives 1+ AZ

  49. Outline • Multimedia Systems [GKKTZ 00] • Maximize the total clients served • Relational Database Layout [AFMPZ 03] • Minimize the combined I/O access time • Load Rebalancing Problem[AMZ 03] • Minimize the makespan within allowed moves AZ

  50. Load Rebalancing • Access pattern changes • Initial layout no longer balanced MAX LOAD 1 3 6 9 7 4 10 2 8 5 11 AZ

More Related