Measuring a (MapReduce) Data Center

Measuring a (MapReduce) Data Center

Typical Data Center Network IP Routers ToR Aggregation Switches 24-, 48- port 1G to server, 10Gbps up ~ $7K … … … … Top-of-rack Switch … … Servers Agg • Less bandwidth up the hierarchy • Clunky routing • e.g., VL2, BCube, FatTree, Portland, DCell Modular switch Chassis + up to 10 blades >140 10G ports $150K-$200K

What does traffic in a datacenter look like? Goal • A realistic model of data center traffic • Compare proposals How to measure a datacenter? (Macro-) Who talks to whom? Congestion, its impact (Micro-) Flow details: Sizes, Durations, Inter-arrivals, flux

How to measure? Servers Agg. Switches MapReduce Scripts + ToR Distr. FS Router = … … … … … … SNMP reports Use the end-hosts to share load • per port: in/out octets • sample every few minutes • miss server- or flow- level info • Auto managed already Packet Traces • Not native on most switches • Hard to set up (port-spans) Measured 1500 servers for several months Sampled NetFlow Tradeoff: CPU overhead on switch for detailed traces

Who Talks To Whom? 1Gbps .4 Gbps 3 Mbps 20 Kbps .2 Kbps 0 Server To Server From • Two patterns dominate • Most of the communication happens within racks • Scatter, Gather

Flows are small. 80% of bytes in flows < 200MB are short-lived. 50% of bytes in flows < 25s turnover quickly. median inter-arrival at ToR = 10-2s Flows which lead to… • Traffic Engineering schemes should react faster, few elephants • Localized traffic  additional bandwidth alleviates hotspots

Congestion, its Impact • are links busy? • who are the culprits? • are apps impacted? Often! 1 .8 .6 .4 .2 0 Contiguous Duration of >70% link utilization (seconds)

Congestion, its Impact • are links busy? • who are the culprits? • are apps impacted? Often! Apps (Extract, Reduce) Marginally

Measurement Alternatives Link Utilizations (e.g., from SNMP) Server 2 Server Traffic Matrix Tomography + make do with easier-to-measure data – under-constrained problem  heuristics gravity

Measurement Alternatives Link Utilizations (e.g., from SNMP) Server 2 Server Traffic Matrix Tomography + make do with easier-to-measure data – under-constrained problem  heuristics gravity max sparse

Measurement Alternatives Link Utilizations (e.g., from SNMP) Server 2 Server Traffic Matrix Tomography + make do with easier-to-measure data – under-constrained problem  heuristics gravity max sparse tomography + Job Information

a first look at traffic in a (map-reduce) data center • some insights • traffic stays mostly within high bandwidth regions • flows aresmall, short-lived and turnover quickly • net highly-utilized oftenwith moderate impact on apps. • measuring @ end-hosts is feasible, necessary (?) • → a model for data center traffic

Measuring a (MapReduce) Data Center

Measuring a (MapReduce) Data Center

Presentation Transcript

Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective

A Platform for Scalable One-pass Analytics using MapReduce

Basics of MapReduce Stanford CS341 Project Course Algorithm Theory for MapReduce

A Hadoop Overview

MapReduce , Collective Communication, and Services

Phoenix Rebirth: Scalable MapReduce on a NUMA System

MapReduce , HBase , and Hive

Hybrid MapReduce Workflow

Twister4Azure

Dynamic Cloud Deployment of a MapReduce Architecture

An Overview of Apache Spark

The Limitation of MapReduce : A Probing Case and a Lightweight Solution

MapReduce

Coursework II: Google MapReduce in GridSAM

MapReduce A Common Mistake Theory of MapReduce Algorithms Some Examples

Cloud Computing Other Mapreduce issues

MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka

Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective