770 likes | 909 Vues
On Managing Continuous Media Data. Edward Chang Hector Garcia-Molina Stanford University. Challenges. Large Volume of Data MPEG2 100 Minute Movie: 3-4 GBytes Large Data Transfer Rate MPEG2: 4 to 6 Mbps HDTV: 19.2 Mbps Just-in-Time Data Requirement Simultaneous Users.
E N D
On Managing Continuous Media Data Edward Chang Hector Garcia-Molina Stanford University
Challenges • Large Volume of Data • MPEG2 100 Minute Movie: 3-4 GBytes • Large Data Transfer Rate • MPEG2: 4 to 6 Mbps • HDTV: 19.2 Mbps • Just-in-Time Data Requirement • Simultaneous Users
...Challenges • Traditional Optimization Objectives: • Maximizing Throughput! • Maximizing Throughput!! • Maximizing Throughout!!! • How about Cost? • How about Initial Latency?
Related Work • IBM T.J. Watson Labs. (P. Yu) • USC (S. Ghandeharizadeh) • UCLA (R. Muntz) • UBC (Raymond Ng) • Bell Labs. (B. Ozden) • etc.
Outline • Server (Single Disk) • Revisiting Conventional Wisdom • Minimizing Cost • Minimizing Initial Latency • Server (Parallel Disks) • Balancing Workload • Minimizing Cost & Initial Latency • Client • Handling VBR • Supporting VCR-like Functions
Conventional Wisdom(for Single Disk) • Reducing Disk Latency leads to Better Disk Utilization • Reducing Disk Latency leads to Higher Throughput • Increasing Disk Utilization leads to Improved Cost Effectiveness
Is Conventional Wisdom Right? • Does Reducing Disk Latency lead to Better Disk Utilization? • Does Reducing Disk Latency lead to Higher Throughput? • Does Increasing Disk Utilization lead to Improved Cost Effectiveness?
S DR Memory Use TR -- DR Tseek Tseek T Time Tseek: Disk Latency TR: Disk Transfer Rate DR: Display Rate S: Segment Size (Peak Memory Use per Request) T: Service Cycle Time
S DR Memory Use TR -- DR Tseek Tseek T Time S = DR × T T = N × (Tseek + S/TR)
Disk Utilization N × TR × DR × Tseek S = TR - N × DR S is directly proportional to Tseek S/TR Dutil = S/TR + Tseek Dutilis Constant!
Is Conventional Wisdom Right? • Does Reducing Disk Latency lead to Better Disk Utilization? NO! • Does Reducing Disk Latency lead to Higher Throughput? • Does Increasing Disk Utilization lead to Improved Cost Effectiveness?
What Affects Throughput? Disk Utilization × Disk Latency Throughput ? Memory Utilization
Memory Requirement • We Examine Two Disk Scheduling Policies’ Memory Requirement • Sweep (Elevator Policy): Enjoys the Minimum Seek Overhead • Fixed-Stretch: Suffers from High Seek Overhead
Per User Peak Memory Use N × TR × DR × Tseek S = TR - N × DR
Sweep (Elevator) • Disk Latency: Minimum • IO Time Variability: Very High B1 A1 A2 B2
Sweep (Elevator) • Memory Sharing: Poor • Total Memory Requirement: 2 * N * Ssweep
Fixed-Stretch • Disk Latency: High (because ofStretch) • IO Variability: No (because ofFixed) a b a b a
Fixed-Stretch • Memory Sharing: Good • Total Memory Requirement: 1/2 * N * Sfs
Sweep 2 * N * Ssweep Available Memory = 40 Mbytes N = 40 Fixed Stretch 1/2 * N * Ssf Available Memory = 40 Mbytes N= 42 Higher Throughput Throughput * Based on A Realistic Case Study Using Seagate Disks
What Affects Throughput? Disk Utilization × Disk Latency Throughput ? Memory Utilization
Is Conventional Wisdom Right? • Does Reducing Disk Latency lead to Better Disk Utilization? NO! • Does Reducing Disk Latency lead to Higher Throughput? NO! • Does Increasing Disk Utilization lead to Improved Cost Effectiveness?
Per Stream Cost Cost Per Stream Cost Memory Cost Disk Cost Number of Users
Per-Stream Memory Cost Cm × N × TR × DR × Tseek Cm× S = TR - N × DR
Example • Disk Cost: $200 a unit • Memory Cost: $5 each MBytes • Supporting N = 40 Requires 60MBytes Memory • $200 + 300 = $500 • Supporting N = 50 Requires 160 MBytes Memory • $200 + 800 = $1,000 • For the same cost $1,000, it’s better to buy 2 Disks and 120 Mbytes to support N = 80 Users! • Memory Use is Critical
Is Conventional Wisdom Right? • Does Reducing Disk Latency lead to Better Disk Utilization? NO! • Does Reducing Disk Latency lead to Higher Throughput? NO! • Does Increasing Disk Utilization lead to Improved Cost Effectiveness? NO!
Outline • Server (Single Disk) • Revisiting Conventional Wisdom • Minimizing Cost • Minimizing Initial Latency • Server (Parallel Disks) • Balancing Workload • Minimizing Cost & Initial Latency • Client • Handling VBR • Supporting VCR-like Functions
Initial Latency • What is it? • The time between when a request arrives at the server to the time when the data is available in the server’s main memory • Where is it important? • Interactive applications (e.g., video game) • Interactive features (e.g., fast-scan)
Fixed-Stretch • Space Out IOs Playback Point S M e m o r y Transfer Seek Time a b C a b
Fixed-Stretch a S1 b c S3 S2
Fixed-Stretch S1 S3 S2
Our Contribution: BubbleUp • Fixed-Stretch Enjoys Fine Throughput • BubbleUp Remedies Fixed-Stretch to Minimize Initial Latency
Schedule Office Work • 8am: Host a Visitor • 9am: Do Email • 10am: Write Paper • 11am: Write Paper • Noon: Lunch
BubbleUp S1 S3 S2
BubbleUp • Empty Slots are Always Next in Time • No additional Memory Required • Fill the Buffer up to the Segment Size • No additional Disk Bandwidth Required • The Disk Is Idle Otherwise
Evaluation 9 7 5 Latency (S) Sweep 3 1 BubbleUp N
Fast-Scan S1 S2 S3
Fast-Scan S1 S2 S4 S3
Data Placement Policies • Please refer to our publications
S1 S2 S3
Chunk Allocation • Allocate Memory in Chunks • A Chunk = k * S • Replicate the Last Segment of a Chunk in the Beginning of Next Chunk • Example • Chunk 1: s1, s2, s3, s4, s5 • Chunk 2: s5, s6, s7, s8, s9
Chunk Allocation • Largest-Fit First • Best Fit (Last Chunk)
18 Segment Placement 4 16 8
Largest-Fit First 4 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 v16 8
Best Fit s16 s17 s18 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 v16
Outline • Server (Single Disk) • Revisiting Conventional Wisdom • Minimizing Cost • Minimizing Initial Latency • Server (Parallel Disks) • Balancing Workload • Minimizing Cost & Initial Latency • Client • Handling VBR • Supporting VCR-like Functions
Unbalanced Workload Video HOT Video Cold Video Cold
Balanced Workload Video HOT Video Cold Video Cold
Per Stream Memory Use (Use M Disks Independently) N × TR × DR × Tseek S = TR - N × DR M × N