USITS PlanetLab: Building Planetary-Scale Services

PlanetLab: an open community test-bed for Planetary-Scale Servicesa “work in progress” for USITS03 David Culler* UC Berkeley Intel Research @ Berkeley • with Larry Peterson, Tom Anderson, Mic Bowman, Timothy Roscoe, Brent Chun, • Frans Kaashoek, Mike Wawrzoniak, ....

PlanetLab today • 121 nodes at 52 sites in 10 countries, 4 continents, ... • Universities, Internet 2, co-lo’s soon • Active and growing research community • Just beginning... ... on way to 1,000 http://www.planet-lab.org USITS PlanetLab

Where did it come from? • Sense of wonder • what would be the next important thing to do in extreme networked systems post cluster, post yahoo, post inktomi, post akamai, post gnutella, post bubble? • Sense of angst • NRC: “looking over the fence at networks” • ossified internet (intelluctually, infrastructure, system) • next internet likely to emerge as overlay on current one (again) • it will be defined by its services, not its transport • Sense of excitement • new class of services & applns that spread over much of the web • CDN’s, P2P’ss just the tip of the iceberg • architectural concepts emerging • scalable translation, dist. storage, dist. events, instrumentation, caching, management USITS PlanetLab

key missing element – hands on experience • Researchers had no vehicle to try out their next n great ideas in this space • Lot’s of simulations • Lot’s of emulation on large clusters • emulab, millennium, modelnet • Lot’s of folks calling their 17 friends before the next deadline • RON testbed • but not the surprises and frustrations of experience at scale to drive innovation USITS PlanetLab

Guidelines (1) • Thousand viewpoints on “the cloud” is what matters • not the thousand servers • not the routers, per se • not the pipes USITS PlanetLab

Guidelines (2) • and you miust have the vantage points of the crossroads • primarily co-location centers USITS PlanetLab

Guidelines (3) • Each service needs an overlay covering many points • logically isolated • Many concurrent services and applications • must be able to slice nodes => VM per service • service has a slice across large subset • Must be able to run each service / app over long period to build meaningful workload • traffic capture/generator must be part of facility • Consensus on “a node” more important than “which node” USITS PlanetLab

Guidelines (4) • Test-lab as a whole must be up a lot • global remote administration and management • mission control • redundancy within • Each service will require its own remote management capability • Testlab nodes cannot “bring down” their site • generally not on main forwarding path • proxy path • must be able to extend overlay out to user nodes? • Relationship to firewalls and proxies is key Management, Management, Management USITS PlanetLab

Guidelines (5) • Storage has to be a part of it • edge nodes have significant capacity • Needs a basic well-managed capability • but growing to the seti@home model should be considered at some stage • may be essential for some services USITS PlanetLab

Confluence of Technologies • Cluster-based scalable distribution, remote execution, management, monitoring tools • UCB Millennium, OSCAR, ..., Utah Emulab, ... • CDNS and P2Ps • Gnutella, Kazaa, ... • Proxies routine • Virtual machines & Sandboxing • VMWare, Janos, Denali,... web-host slices (EnSim) • Overlay networks becoming ubiquitous • xBone, RON, Detour... Akamai, Digital Island, .... • Service Composition Frameworks • yahoo, ninja, .net, websphere, Eliza • Established internet ‘crossroads’ – colos • Web Services / Utility Computing • Authentication infrastructure (grid) • Packet processing (layer 7 switches, NATs, firewalls) • Internet instrumentation The Time is NOW USITS PlanetLab

March 02 “Underground Meeting” • Intel Research • David Culler • Timothy Roscoe • Sylvia Ratnasamy • Gaetano Borriello • Satya (CMU Srini) • Milan Milenkovic • Duke • Amin Vadat • Jeff Chase • Princeton • Larry Peterson • Randy Wang • Vivek Pai • Rice • Peter Druschel • Utah • Jay Lepreau • CMU • Srini Seshan • Hui Zhang • UCSD • Stefan Savage • Columbia • Andrew Campbell • ICIR • Scott Shenker • Eddie Kohler Washington Tom Anderson Steven Gribble David Wetherall MIT Frans Kaashoek Hari Balakrishnan Robert Morris David Anderson Berkeley Ion Stoica Joe Helerstein Eric Brewer Kubi USITS PlanetLab see http://www.cs.berkeley.edu/~culler/planetlab

Outcome • “Mirror of Dreams” project • K.I.S.S. • Building Blocks, not solutions • no big standards, OGSA-like, meta-hyper-supercomputer • Compromise • A basic working testbed in the hand is much better than “exactly my way” in the bush • “just give me a bunch of (virtual) machines spread around the planet,.. I’ll take it from there” • small distr. arch team, builders, users USITS PlanetLab

design deploy measure Tension of Dual Roles • Research testbed • run fixed-scope experiments • large set of geographically distributed machines • diverse & realistic network conditions • Deployment platform for novel services • run continuously • develop a user community that provides realistic workload USITS PlanetLab

YOU ARE HERE Overlapping Phases 2003 2004 2005 Build a working “sandbox” of significant scale quickly to catalyze the community. 0. seed I. get API & interfaces right II. get underlying arch. and impl. right USITS PlanetLab

Architecture principles • “Slices” as fundamental resource unit • distributed set of (virtual machine) resources • a service runs in a slice • resources allocated / limited per-slice (proc, bw, namespace) • Distributed Resource Control • host controls node, service producer, service consumers • Unbundled Management • provided by basic services (in slices) • instrumentation and monitoring a fundamental service • Application-Centric Interfaces • evolve from what people actually use • Self-obsolescence • everything we build should eventually be replaced by the community • initial centralized services only bootstrap distributed ones USITS PlanetLab

Slice-ability • Each service runs in a slice of PlanetLab • distributed set of resources (network of virtual machines) • allows services to run continuously • VM monitor on each node enforces slices • limits fraction of node resources consumed • limits portion of name spaces consumed • Challenges • global resource discovery • allocation and management • enforcing virtualization • security USITS PlanetLab

Unbundled Management • Partition management into orthogonal services • resource discovery • monitoring system health • topology management • manage user accounts and credentials • software distribution and updates • Approach • management services run in their own slice • allow competing alternatives • engineer for innovation (define minimal interfaces) USITS PlanetLab

Distributed Resource Control • At least two interested parties • service producers (researchers) • decide how their services are deployed over available nodes • service consumers (users) • decide what services run on their nodes • At least two contributing factors • fair slice allocation policy • both local and global components (see above) • knowledge about node state • freshest at the node itself USITS PlanetLab

Application-Centric Interfaces • Inherent problems • stable platform versus research into platforms • writing applications for temporary testbeds • integrating testbeds with desktop machines • Approach • adopt popular API (Linux) and evolve implementation • eventually separate isolation and application interfaces • provide generic “shim” library for desktops USITS PlanetLab

Service-Centric Virtualization USITS PlanetLab

Changing VM landscape • VMs for complete desktop env. re-emerging • e.g., VMware • extremely complete, poor scaling • VM sandboxes widely used for web hosting • ensim, BSD Jail, linux vservers (glunix, ufo, ...) • limited /bin, no /dev, many VMs per FM • limit the API for security • Scalable Isolation kernels (VMMs) • host multiple OS’s on cleaner VM • Denali, Xen • Simple enough to make secure • attack on hosted OS is isolated Savage/Anderson view: security is the most critical requirement, there has never been a truly secure VM, it can only be secure if it has no bugs... USITS PlanetLab

How much to virtualize? • enough to deploy the next planet-lab within a slice on the current one... • enough network access to build network gateways for overlays • Phase 0: unix process as VM • SILK (Scout in Linux Kernal) to provide resource metering, allocation • Phase 1: sandbox • evolved a constrained, secure API (subset) • Phase 2: small isolation kernel with narrow API • some services built on it directly • host linux / sandbox on top for legacy services USITS PlanetLab

Linux Service n Slivers of a Slice: long-term plan XP BSD Service 3 Service 4 Application Interface Service 1 Service 2 Isolation Interface Isolation Kernel • Denali • Xenoserver • VMWare Hardware USITS PlanetLab

Kickoff to catalyze community • Seeded 100 machines in 42 sites July 02 • avoid machine configuration issues • huge set of administrative concerns • Intel Research, Development, and Operations • UCB Rootstock build distribution tools • boot once from floppy to build local cluster • periodic and manual update with local modification • UCB Ganglia remote monitoring facility • aggregate stats from each site, pull into common database • 10 Slices (accounts) per site on all machines • authenticate principal (PIs), delegation of access • key pairs stored in PL central, PIs control which get pushed out • PIs map users to slices • Discovery by web pages • Basic SSH and scripts ... grad students roll what they need USITS PlanetLab

the meta-testbed effect • Emulab / netbed • boot-your-own OS doesn’t scale to unaffiliated site • architecture should permit it virtually • service lives in a slice • offers its own user mgmt, authentication, ... => need to offer virtual machine with virtual chroot ASAP • RON • need access to raw sockets to build gateways • need safe (restricted) access to raw sockets early • need mount • Hard to put a machine in someone else’s site and give out root. • Architecturally, should not need to do it. => pushed VServer and SILK agenda and ... federate without losing identity USITS PlanetLab

Vserver Vserver Vserver Service 1 Service 2 Service n Current Approach (on to phase I) Vserver Vserver Service 3 Service 4 Combined Isolation and Application Interface Linux + Resource Isolation + Safe Raw Sockets + Instrumentation Hardware + Ganglia, InforSpec, ScoutMonitor USITS PlanetLab

vServer experience (Brent Chun) • New set of scaling issues: disk footprint • 1581 directories, 28959 files • VM-specific copy-on-write reduced 29 MB/vm • copied part: 5.6 MB /etc, 18.6 MB /var • 1000 VMs per disk • Current • 222+ per node • 30-40 secs create, 10 secs delete • developing VM preallocate & cache • slice login -> vserver root • Limitations • common OS for all VMs (few calls for multiple OS’s) • user-level NFS mount (MIT’s on it) • incomplete self-virtualization • incomplete resource isolation (eg. buffer cache) • inperfect (but unbroken) kernel security => raised the bar on isolation kernels USITS PlanetLab

SILK (Princeton) • key elements of ANets NodeOS in linux • familiar API • Safe raw sockets • enables network gateways, application overlays • Monitoring • traffic per slice, per node • 5 min snapshots bytes sent/recv per slice x node • Isolation and limits • bandwidth • memory soon USITS PlanetLab

acquire ticket  lease description candidates description ticket reserve description Dynamic Slice Creation N1 . . . N2 Agent Broker Service Manager N3 N4 . . . . . . Nm USITS PlanetLab

BootCD – enabling growth • Constrained linux booted from CD with networking • Knows how to phone home and get signed script • check signature and run • install • chain boot • reboot with special sshd • register first... • grow the testbed and use it too http://www.planet-lab.org/joining/ USITS PlanetLab

A typical day (1/28) USITS PlanetLab

Run up to SIGCOMM USITS PlanetLab

A Slice for a Month (Duke) bytes recv’d per day by nodes bytes sent per day by nodes USITS PlanetLab

So what are people doing? Ping! USITS PlanetLab

Really... • Internet Instrumentation • DHT – scalable lookup, location • Distributed Storage • User-level Multicast • Distributed CDN, Search, ... • and all of them are doing a lot of pinging, copying, and timing • key aspect of an overlay network is to estimate performance characteristics of each virtual link USITS PlanetLab

with the internet in the middle scp 4 MB to MIT, Rice, CIT confirm Padhye SIGCOMM98 83 machines, 11/1/02 Sean Rhea basis for DHT comparison 143 RON+PlanetLab Synthetic Coodinate c/o Frans Kaashoek USITS PlanetLab 110 machine, c/o Ion Stoica i3 weather service

Analysis of Tapestry (Ben Zhao) • 98 machines, 6-7 Tapestry nodes per machine, all node pairs • Ratio of end-to-end routing latency to shortest ping time between node • Ratio of object location to ping • 10,000 objects per node Median=31.5, 90th percentile=135 90th percentile=158 USITS PlanetLab

Towards an instrumentation service • every overlay, DHT, and multicast is measuring the internet in the middle • they do it in different ways • they do different things with the data • Can this be abstracted into a customizable instrumentation service? • Share common underlying measurements • Reduce ping, scp load • Grow down into the infrastructure USITS PlanetLab

Ossified or fragile? • One group forgot to turn off an experiment • after 2 weeks of router being pinged every 2 seconds, ISP contacted ISI and threatened to shut them down. • One group failed to initialize destination address and ports (and had many virtual nodes on each of many physical nodes) • worked OK when tested on a LAN • trashed flow-caches in routers • probably generated a lot of unreachable destination traffic • triggered port-scan alarms at ISPs (port 0) • n^2 probe packets trigger other alarms USITS PlanetLab

the Gaetano advice • for this to be successful, it will need the support of network and system administrators at all the sites... • it would be good to start by building tools that made their job easier USITS PlanetLab

ScriptRoute (Spring, Wetherall, Anderson) • Traceroute provides a way to measure from you out • 100s of traceroute servers have appeared to help debug connectivity problems • very limited functionality • => provide simple, instrumentation sandbox at many sites in the internet • TTL, MTU, BW, congestion, reordering • safe interpreter + network guardian to limit impact • individual and aggregate limits USITS PlanetLab

Example: reverse trace • underlying debate: open, unauthenticated, community measurement infrastructure vs closed, engineered service • see also Princeton BGP multilateration UW Google USITS PlanetLab

Ossified or brittle? • Scriptroute set of several alarms • Low bandwidth traffic to lots of ip addresses brought routers to a crawl • Lots of small TTLs but not exactly Traceroute packets... • isp installed filter blocking subnet at Harvard and sent notice to network administrator without human intervention • Is innovation still allowed? USITS PlanetLab

NetBait Serendipity • Brent Chun built a simple http server on port 80 to explain what planetlab was about and to direct inquiries to planet-lab.org • It also logged requests • Sitting just outside the firewall of ~40 universities... • the worlds largest honey pot • the number of worm probes from compromized machines was shocking • imagine the the epidemiology • see netbait.planet-lab.org USITS PlanetLab

One example • The monthly code-red cycle in the large? • What happened a little over a week ago? USITS PlanetLab

No, not Iraq • A new voracious worm appeared and displaced the older Code Red USITS PlanetLab

Netbait view of March USITS PlanetLab

DHT Bakeoff • Proliferation of distributed hash tables, content-address networks, dist. object location was a primary driver for PlanetLab • chord, can, pastry, tapestry, Kademlia, viceroy, ... • map a large identifier (160 bits) to object by routing (in the overlay) to node responsible for that key • in presence of concurrent inserts, joins, fails, leaves, ... • Natural for the community to try resolve the many proposals • Common API to allow for benchmarking (Dabek etal, IPTPS) • Analytical Comparisons • Ratnasamy says “rings are good” • Empirical Comparisons USITS PlanetLab

Rationalizing Structured P2P Overlays CFS PAST i3 SplitStream Bayeux OceanStore Tier 2 join, leave multicast, anycast publish, unpublish, sendToObj Tier 1 get, put, remove DHT CAST DOLR route(key, msg) +upcalls and id mgmt Key-based Routing Tier 0 USITS PlanetLab

Empirical Comparison (Rhea, Roscoe,Kubi) • 79 PlanetLab nodes, 400 ids per node • Performed by Tapesty side USITS PlanetLab

USITS PlanetLab: Building Planetary-Scale Services

USITS PlanetLab: Building Planetary-Scale Services

Presentation Transcript

EECS 150 - Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication)

EECS 150 - Components and Design Techniques for Digital Systems Lec 15 – Storage: Regs, SRAM, ROM

EECS 150 - Components and Design Techniques for Digital Systems Lec 16 – Storage: DRAM, SDRAM

Programmable Packets in the Emerging Extreme Internet

Ninja and the Post-PC Era

EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Sequential Logic - Advanced

High Performance Information Retrieval and MEMS CAD on Intel Itanium

EECS 150 - Components and Design Techniques for Digital Systems Lec 18 – Error Coding

David E. Culler Electrical Engineering and Computer Sciences University of California, Berkeley

Welcome to EECS Grad Student Visit Day 2011

Tiny Networking

ProActive Infrastructure

NPACI Panel on Clusters

David E. Culler University of California, Berkeley

David E. Culler University of California, Berkeley Arch Rock Corp. July 9, 2007

Free Riding Multicast

NOW and Beyond

Phil Levis, Stanford Univ. JP. Vasseur, Cisco Systems David Culler, UC Berkeley

Background on Berkeley Lab DOE Research Facility Managed by UC System 200 acre site ~3,800 Workers

Towards a Distributed Test-Lab for Planetary-Scale Services

Progress on System Architecture for Extreme Devices

Keck HydroWatch Center at UC Berkeley