On Managing Continuous Media Data

On Managing Continuous Media Data Edward Chang Hector Garcia-Molina Stanford University

Challenges • Large Volume of Data • MPEG2 100 Minute Movie: 3-4 GBytes • Large Data Transfer Rate • MPEG2: 4 to 6 Mbps • HDTV: 19.2 Mbps • Just-in-Time Data Requirement • Simultaneous Users

...Challenges • Traditional Optimization Objectives: • Maximizing Throughput! • Maximizing Throughput!! • Maximizing Throughout!!! • How about Cost? • How about Initial Latency?

Related Work • IBM T.J. Watson Labs. (P. Yu) • USC (S. Ghandeharizadeh) • UCLA (R. Muntz) • UBC (Raymond Ng) • Bell Labs. (B. Ozden) • etc.

Outline • Server (Single Disk) • Revisiting Conventional Wisdom • Minimizing Cost • Minimizing Initial Latency • Server (Parallel Disks) • Balancing Workload • Minimizing Cost & Initial Latency • Client • Handling VBR • Supporting VCR-like Functions

Conventional Wisdom(for Single Disk) • Reducing Disk Latency leads to Better Disk Utilization • Reducing Disk Latency leads to Higher Throughput • Increasing Disk Utilization leads to Improved Cost Effectiveness

Is Conventional Wisdom Right? • Does Reducing Disk Latency lead to Better Disk Utilization? • Does Reducing Disk Latency lead to Higher Throughput? • Does Increasing Disk Utilization lead to Improved Cost Effectiveness?

S DR Memory Use TR -- DR Tseek Tseek T Time Tseek: Disk Latency TR: Disk Transfer Rate DR: Display Rate S: Segment Size (Peak Memory Use per Request) T: Service Cycle Time

S DR Memory Use TR -- DR Tseek Tseek T Time S = DR × T T = N × (Tseek + S/TR)

Disk Utilization N × TR × DR × Tseek S = TR - N × DR S is directly proportional to Tseek S/TR Dutil = S/TR + Tseek Dutilis Constant!

Is Conventional Wisdom Right? • Does Reducing Disk Latency lead to Better Disk Utilization? NO! • Does Reducing Disk Latency lead to Higher Throughput? • Does Increasing Disk Utilization lead to Improved Cost Effectiveness?

What Affects Throughput? Disk Utilization × Disk Latency Throughput ? Memory Utilization

Memory Requirement • We Examine Two Disk Scheduling Policies’ Memory Requirement • Sweep (Elevator Policy): Enjoys the Minimum Seek Overhead • Fixed-Stretch: Suffers from High Seek Overhead

Per User Peak Memory Use N × TR × DR × Tseek S = TR - N × DR

Sweep (Elevator) • Disk Latency: Minimum • IO Time Variability: Very High B1 A1 A2 B2

Sweep (Elevator) • Memory Sharing: Poor • Total Memory Requirement: 2 * N * Ssweep

Fixed-Stretch • Disk Latency: High (because ofStretch) • IO Variability: No (because ofFixed) a b a b a

Fixed-Stretch • Memory Sharing: Good • Total Memory Requirement: 1/2 * N * Sfs

Sweep 2 * N * Ssweep Available Memory = 40 Mbytes N = 40 Fixed Stretch 1/2 * N * Ssf Available Memory = 40 Mbytes N= 42 Higher Throughput Throughput * Based on A Realistic Case Study Using Seagate Disks

What Affects Throughput? Disk Utilization × Disk Latency Throughput ? Memory Utilization

Is Conventional Wisdom Right? • Does Reducing Disk Latency lead to Better Disk Utilization? NO! • Does Reducing Disk Latency lead to Higher Throughput? NO! • Does Increasing Disk Utilization lead to Improved Cost Effectiveness?

Per Stream Cost Cost Per Stream Cost Memory Cost Disk Cost Number of Users

Per-Stream Memory Cost Cm × N × TR × DR × Tseek Cm× S = TR - N × DR

Example • Disk Cost: $200 a unit • Memory Cost: $5 each MBytes • Supporting N = 40 Requires 60MBytes Memory • $200 + 300 = $500 • Supporting N = 50 Requires 160 MBytes Memory • $200 + 800 = $1,000 • For the same cost $1,000, it’s better to buy 2 Disks and 120 Mbytes to support N = 80 Users! • Memory Use is Critical

Is Conventional Wisdom Right? • Does Reducing Disk Latency lead to Better Disk Utilization? NO! • Does Reducing Disk Latency lead to Higher Throughput? NO! • Does Increasing Disk Utilization lead to Improved Cost Effectiveness? NO!

So What?

Initial Latency • What is it? • The time between when a request arrives at the server to the time when the data is available in the server’s main memory • Where is it important? • Interactive applications (e.g., video game) • Interactive features (e.g., fast-scan)

Sweep (Elevator)

Fixed-Stretch • Space Out IOs Playback Point S M e m o r y Transfer Seek Time a b C a b

Fixed-Stretch a S1 b c S3 S2

Fixed-Stretch S1 S3 S2

Our Contribution: BubbleUp • Fixed-Stretch Enjoys Fine Throughput • BubbleUp Remedies Fixed-Stretch to Minimize Initial Latency

Schedule Office Work • 8am: Host a Visitor • 9am: Do Email • 10am: Write Paper • 11am: Write Paper • Noon: Lunch

BubbleUp S1 S3 S2

BubbleUp • Empty Slots are Always Next in Time • No additional Memory Required • Fill the Buffer up to the Segment Size • No additional Disk Bandwidth Required • The Disk Is Idle Otherwise

Evaluation 9 7 5 Latency (S) Sweep 3 1 BubbleUp N

Fast-Scan S1 S2 S3

Fast-Scan S1 S2 S4 S3

Data Placement Policies • Please refer to our publications

S1 S2 S3

Chunk Allocation • Allocate Memory in Chunks • A Chunk = k * S • Replicate the Last Segment of a Chunk in the Beginning of Next Chunk • Example • Chunk 1: s1, s2, s3, s4, s5 • Chunk 2: s5, s6, s7, s8, s9

Chunk Allocation • Largest-Fit First • Best Fit (Last Chunk)

18 Segment Placement 4 16 8

Largest-Fit First 4 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 v16 8

Best Fit s16 s17 s18 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 v16

Unbalanced Workload Video HOT Video Cold Video Cold

Balanced Workload Video HOT Video Cold Video Cold

Per Stream Memory Use (Use M Disks Independently) N × TR × DR × Tseek S = TR - N × DR M × N

On Managing Continuous Media Data

On Managing Continuous Media Data

Presentation Transcript

Managing Data

Managing Your Media Relations

Managing Data

MANAGING THE MEDIA

Managing Media

Continuous Data

Media Briefing Update on IOP Continuous Monitoring Technologies

Managing Social Media

Managing data

Managing Interactive Media

Continuous Query Processing on Spatio-Temporal Data Streams

Managing User-generated content on media websites

Continuous Media 1

Continuous Data Stream Processing

Continuous Data

Managing Data

Managing Data

Continuous Data Protector

Managing Your Media Relations

Discrete and Continuous Data

Continuous Data Stream Processing

Systems Support for Continuous Media