CS 525 Advanced Distributed Systems Spring 09

CS 525 Advanced Distributed SystemsSpring 09 Indranil Gupta (Indy) Lecture 4 The Grid. Clouds. January 29, 2009

Two Questions We’ll Try to Answer • What is the Grid? Basics, no hype. • What is its relation to p2p?

Example: Rapid Atmospheric Modeling System, ColoState U • Hurricane Georges, 17 days in Sept 1998 • “RAMS modeled the mesoscale convective complex that dropped so much rain, in good agreement with recorded data” • Used 5 km spacing instead of the usual 10 km • Ran on 256+ processors • Can one run such a program without access to a supercomputer?

Wisconsin Distributed ComputingResources NCSA MIT

An Application Coded by a Physicist Output files of Job 0 Input to Job 2 Job 0 Job 1 Job 2 Jobs 1 and 2 can be concurrent Output files of Job 2 Input to Job 3 Job 3

An Application Coded by a Physicist Output files of Job 0 Input to Job 2 Several GBs • May take several hours/days • 4 stages of a job • Init • Stage in • Execute • Stage out • Publish • Computation Intensive, • so Massively Parallel Job 2 Output files of Job 2 Input to Job 3

Wisconsin Job 0 Job 1 Job 2 Job 3 Allocation? Scheduling? NCSA MIT

Wisconsin Job 0 Condor Protocol Job 1 Job 2 Job 3 Globus Protocol NCSA MIT

Wisconsin Job 3 Job 0 Internal structure of different sites invisible to Globus Globus Protocol Job 1 NCSA MIT Job 2 External Allocation & Scheduling Stage in & Stage out of Files

Wisconsin Condor Protocol Job 3 Job 0 Internal Allocation & Scheduling Monitoring Distribution and Publishing of Files

Tiered Architecture (OSI 7 layer-like) High energy Physics apps Resource discovery, replication, brokering Globus, Condor Workstations, LANs Opportunity for Crossover ideas from p2p systems

The Grid Today Some are 40Gbps links! (The TeraGrid links) “A parallel Internet”

Globus Alliance • Alliance involves U. Illinois Chicago, Argonne National Laboratory, USC-ISI, U. Edinburgh, Swedish Center for Parallel Computers • Activities : research, testbeds, software tools, applications • Globus Toolkit (latest ver - GT3) “The Globus Toolkit includes software services and libraries for resource monitoring, discovery, and management, plus security and file management. Its latest version, GT3, is the first full-scale implementation of new Open Grid Services Architecture (OGSA).”

More • Entire community, with multiple conferences, get-togethers (GGF), and projects • Grid Projects: http://www-fp.mcs.anl.gov/~foster/grid-projects/ • Grid Users: • Today: Core is the physics community (since the Grid originates from the GriPhyN project) • Tomorrow: biologists, large-scale computations (nug30 already)?

Some Things Grid Researchers Consider Important • Single sign-on: collective job set should require once-only user authentication • Mapping to local security mechanisms: some sites use Kerberos, others using Unix • Delegation: credentials to access resources inherited by subcomputations, e.g., job 0 to job 1 • Community authorization: e.g., third-party authentication

Grid History – 1990’s • CASA network: linked 4 labs in California and New Mexico • Paul Messina: Massively parallel and vector supercomputers for computational chemistry, climate modeling, etc. • Blanca: linked sites in the Midwest • Charlie Catlett, NCSA: multimedia digital libraries and remote visualization • More testbeds in Germany & Europe than in the US • I-way experiment: linked 11 experimental networks • Tom DeFanti, U. Illinois at Chicago and Rick Stevens, ANL:, for a week in Nov 1995, a national high-speed network infrastructure. 60 application demonstrations, from distributed computing to virtual reality collaboration. • I-Soft: secure sign-on, etc.

Trends: Technology • Doubling Periods – storage: 12 mos, bandwidth: 9 mos, and (what law is this?) cpu speed: 18 mos • Then and Now Bandwidth • 1985: mostly 56Kbps links nationwide • 2004: 155 Mbps links widespread Disk capacity • Today’s PCs have 100GBs, same as a 1990 supercomputer

Trends: Users • Then and Now Biologists: • 1990: were running small single-molecule simulations • 2004: want to calculate structures of complex macromolecules, want to screen thousands of drug candidates Physicists • 2006: CERN’s Large Hadron Collider produced 10^15 B/year • Trends in Technology and User Requirements: Independent or Symbiotic?

Prophecies In 1965, MIT's Fernando Corbató and the other designers of the Multics operating system envisioned a computer facility operating “like a power company or water company”. Plug your thin client into the computing Utiling and Play your favorite Intensive Compute & Communicate Application • [Will this be a reality with the Grid?]

“We must address scale & failure” “We need infrastructure” P2P Grid

Definitions • “Infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities” (1998) • “A system that coordinates resources not subject to centralized control, using open, general-purpose protocols to deliver nontrivial QoS” (2002) • “Applications that takes advantage of resources at the edges of the Internet” (2000) • “Decentralized, self-organizing distributed systems, in which all or most communication is symmetric” (2002) Grid P2P

Definitions • “Infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities” (1998) • “A system that coordinates resources not subject to centralized control, using open, general-purpose protocols to deliver nontrivial QoS” (2002) • “Applications that takes advantage of resources at the edges of the Internet” (2000) • “Decentralized, self-organizing distributed systems, in which all or most communication is symmetric” (2002) Grid P2P 525: (good legal applications without intellectual fodder) 525: (clever designs without good, legal applications)

Grid versus P2P - Pick your favorite

Grid Often complex & involving various combinations of Data manipulation Computation Tele-instrumentation Wide range of computational models, e.g. Embarrassingly || Tightly coupled Workflow Consequence Complexity often inherent in the application itself P2P Some File sharing Number crunching Content distribution Measurements Legal Applications? Consequence Low Complexity Applications

P2P V. large numbers of entities Moderate activity E.g., 1-2 TB in Gnutella (’01) Diverse approaches to failure Centralized (SETI) Decentralized and Self-Stabilizing Grid Moderate number of entities 10s institutions, 1000s users Large amounts of activity 4.5 TB/day (D0 experiment) Approaches to failure reflect assumptions E.g., centralized components Scale and Failure • (www.slyck.com, 2/19/’03)

Grid Moderate number of entities 10s institutions, 1000s users Large amounts of activity 4.5 TB/day (D0 experiment) Approaches to failure reflect assumptions E.g., centralized components P2P V. large numbers of entities Moderate activity E.g., 1-2 TB in Gnutella (’01) Diverse approaches to failure Centralized (SETI) Decentralized and Self-Stabilizing Scale and Failure • (www.slyck.com, 2/19/’03)

Grid Standard protocols (Global Grid Forum, etc.) De facto standard software (open source Globus Toolkit) Shared infrastructure (authentication, discovery, resource access, etc.) Consequences Reusable services Large developer & user communities Interoperability & code reuse P2P Each application defines & deploys completely independent “infrastructure” JXTA, BOINC, XtremWeb? Efforts started to define common APIs, albeit with limited scope to date Consequences New (albeit simple) install per application Interoperability & code reuse not achieved Services and Infrastructure

Grid P2P Coolness Factor

Summary: Grid and P2P 1) Both are concerned with the same general problem • Resource sharing within virtual communities 2) Both take the same general approach • Creation of overlays that need not correspond in structure to underlying organizational structures 3) Each has made genuine technical advances, but in complementary directions • “Grid addresses infrastructure but not yet scale and failure” • “P2P addresses scale and failure but not yet infrastructure” 4) Complementary strengths and weaknesses => room for collaboration (Ian Foster at UChicago)

Crossover Ideas Some P2P ideas useful in the Grid • Resource discovery(DHTs), e.g., how do you make “filenames” more expressive, i.e., a computer cluster resource? • Replication models, for fault-tolerance, security, reliability • Membership, i.e., which workstations are currently available? • Churn-Resistance, i.e., users log in and out; problem difficult since free host gets a entire computations, not just small files • All above are open research directions, waiting to be explored!

Cloud Computing What’s it all about? A First Step

Where is Grid? Where is cloud computing? Life of Ra (a Research Area) First peak – end of hype (“This is a hot area!”) Hype - “Wow!” First trough – “I told you so!” POPULARITY OF AREA TIME Young Adolescent Middle Age Old Age (solid base, hybrid algorithms) (incremental Solutions) (low-hanging fruits) (interesting Problems)

How do I identify what stage a research area is in? • If there have been no publications in research area more than 1-2 years old, it is in the “Young Phase” • Pick a paper in the last 1 year published in the research area. Read it. If you think that you could have come up with the core idea in that paper (given all the background etc.), then the research area is in its “Young” phase. • Find the latest published paper that you think you could have come up with the idea for. If this paper has been cited by one round of papers (but these citing papers themselves have not been cited), then the research area is in the “Adolescent” phase. • Do Step 3 above, and if you find that the citing papers themselves have been cited, and so on, then the research area is at least in the “Middle Age” phase. • Pick a paper in the last 1-2 years. If you find that there are only incremental developments in these latest published papers, and the ideas may be innovative but are not yielding large enough performance benefits, then the area is mature. • If no one works in the research area, or everyone you talk to thinks negatively about the area (except perhaps the inventors of the area), then the area is dead.

What is a cloud? • It’s a cluster! It’s a supercomputer! It’s a datastore! • It’s superman! • None of the above • Cloud = Lots of storage + compute cycles nearby

Data-intensive Computing • Computation-Intensive Computing • Example areas: MPI-based, High-performance computing, Grids • Typically run on supercomputers (e.g., NCSA Blue Waters) • Data-Intensive • Typically store data at datacenters • Use compute nodes nearby • Compute nodes run computation services • In data-intensive computing, the focus shifts from computation to the data: problem areas include • Storage • Communication bottleneck • Moving tasks to data (rather than vice-versa) • Security • Availability of Data • Scalability

Distributed Clouds • A single-site cloud consists of • Compute nodes (split into racks) • Switches, connecting the racks • Storage (backend) nodes connected to the network • Front-end for submitting jobs • Services: physical resource set, software services • A geographically distributed cloud consists of • Multiple such sites • Each site perhaps with a different structure and services

DL160 DL160 DL160 DL160 Internal Switch Internal Switch Internal Switch Internal Switch 32 nodes 32 nodes 32 nodes 32 nodes Cirrus Cloud at University of Illinois Only show internal switches used for data transfers, 1GbE with 48 ports Storage Node Storage Node Storage Node 8 ports 2 ports Storage Node 8 ports Procurve Switch Procurve Switch 2 ports Head Node Note: System management, monitoring, and operator console will use a different set of switches not pictured here.

Example: Cirrus Cloud at U. Illinois • 128 servers. Each has • 8 cores (total 1024 cores) • 16 GB RAM • 2 TB disk • Backing store of about 250 TB • Total storage: 0.5 PB • Gigabit Networking

6 Diverse Sites within Cirrus • UIUC – Systems Research for Cloud Computing + Cloud Computing Applications • Karlsruhe Institute of Tech (KIT, Germany): Grid-style jobs • IDA, Singapore • Intel • HP • Yahoo!: CMU’s M45 cluster All will be networked together: see http://www.cloudtestbed.org

What “Services”? Different Clouds Export different services • Industrial Clouds • Amazon S3 (Simple Storage Service): store arbitrary datasets • Amazon EC2 (Elastic Compute Cloud): upload and run arbitrary images • Google AppEngine: develop applications within their appengine framework, upload data that will be imported into their format, and run • Academic Clouds • Google-IBM Cloud (U. Washington): run apps programmed atop Hadoop • Cirrus cloud: run (i) apps programmed atop Hadoop and Pig, and (ii) systems-level research on this first generation of cloud computing models

Software “Services” • Computational • MapReduce (Hadoop) • Pig Latin • Naming and Management • Zookeeper • Tivoli, OpenView • Storage • HDFS • PNUTS

Sample Service: MapReduce • Google uses MapReduce to run 100K jobs per day, processing up to 20 PB of data • Yahoo! has released open-source software Hadoop that implements MapReduce • Other companies that have used MapReduce to process their data: A9.com, AOL, Facebook, The New York Times • Highly-Parallel Data-Processing

What is MapReduce? • Terms are borrowed from Functional Language (e.g., Lisp) Sum of squares: • (map square ‘(1 2 3 4)) • Output: (1 4 9 16) [processes each record sequentially and independently] • (reduce + ‘(1 4 9 16)) • (+ 16 (+ 9 (+ 4 1) ) ) • Output: 30 [processes set of all records in a batch]

Map • Process individual key/value pair to generate intermediate key/value pairs. Welcome Everyone Hello Everyone Welcome 1 Everyone 1 Hello 1 Everyone 1 Input <filename, file text>

Reduce • Processes and merges all intermediate values associated with each given key assigned to it Welcome 1 Everyone 1 Hello 1 Everyone 1 Everyone 2 Hello 1 Welcome 1

Some Applications • Distributed Grep: • Map - Emits a line if it matches the supplied pattern • Reduce - Copies the the intermediate data to output • Count of URL access frequency • Map – Process web log and outputs <URL, 1> • Reduce - Emits <URL, total count> • Reverse Web-Link Graph • Map – process web log and outputs <target, source> • Reduce - emits <target, list(source)>

Programming MapReduce • Externally: For user • Write a Map program (short), write a Reduce program (short) • Submit job; wait for result • Need to know nothing about parallel/distributed programming! • Internally: For the cloud (and for us distributed systems researchers) • Parallelize Map • Transfer data from Map to Reduce • Parallelize Reduce • Implement Storage for Map input, Map output, Reduce input, and Reduce output

CS 525 Advanced Distributed Systems Spring 09

CS 525 Advanced Distributed Systems Spring 09

Presentation Transcript

CS 525 Advanced Distributed Systems Spring 2010

CS 598IG: Adv. Topics in Distributed Systems Spring 2006

JPEG 2000 CS 525 Research Project Spring 2008

CS 6410: Advanced Systems

CS 6410: Advanced Systems

CS 6410: Advanced Systems

CS 6410: Advanced Systems

CS 525 Advanced Topics in Distributed Systems Spring 08

CS 525 Advanced Topics in Distributed Systems Spring 07

CS 525 Advanced Distributed Systems Spring 09

CS 525 Advanced Distributed Systems Spring 2014

CS525 Advanced Distributed Systems Spring 2009

CS 425: Distributed Systems

CS-556: Distributed Systems

CS 525 Advanced Topics in Distributed Systems Spring 08

CS 775 : Distributed Systems

CS 6464: Advanced Distributed Storage Systems

CS 194: Distributed Systems Distributed File Systems

CS 6601 – Distributed Systems

CS 525 Advanced Distributed Systems Spring 2013

CS 425: Distributed Systems

CS 194: Distributed Systems Distributed based Object Systems