Grid Computing 7700 Fall 2005 Lecture 2: About Grid Computing

Grid Computing 7700Fall 2005Lecture 2: About Grid Computing Gabrielle Allen allen@bit.csc.lsu.edu http://www.cct.lsu.edu/~gallen/Teaching

Quick Test • What reason does Foster (2002) give that the Web is not a Grid? • Advances in which area have changed the way we should think about collaboration: a) sensors, b) supercomputers, c) mass storage, d) networks, e) HDTV • What is GGF an acronym for? • What speed do gravitational waves travel at? a) speed of sound, b) speed of light, c) infinite speed, d) 103,457 km/s, e) they do not move

Some History 1843 US Congress investigate telegraph technology 1866 Transatlantic telegraph cable laid 1901 Transatlantic radio transmission 1965 Multics developers envisage utility computing 1969 Unix is developed 1970 ARPANET: DoD exerimental WAN, precusor to internet 1972 C written by Ritchie 1975 Microsoft founded 1980s Parallel computing: algorithms, programs and architectures 1980s “Grand Challenge” applications 1985 NSFNET: Links SC centers at 56 kbps 1988 Condor project starts (LAN based) 1989 “Metacomputing” term (CASA project) 1990 HTML developed by Tim Berners-Lee, first browsers 1991 Linus Thorvalds works on Linux 1993 Mosaic browser released 1999 SETI@home 1999 Napster: Centralized file sharing 2000 Microsoft release .NET 2000 Gnutella released: P2P file sharing 2001 “Anatomy of the Grid” 2001 NSF announces TeraGrid 2001 First Global Grid Forum 2001 Cactus, Globus, MPICH-G2 win Gordon Bell prize 2002 Earth Simulator: 40TFlop NEC machine 2002 Globus 2.0 released 2002 “Physiology of the Grid” 2003 Globus 3.0 released 2003 10Gbps transatlantic optical network demonstrated 2005 Globus 4.0 released 2005 TeraGrid awarded $150M 1993 Legion project starts 1993 HPF specification released 1994 MPI-1 specification released 1994 Nimrod project starts (LAN based) 1994 First beowulf cluster 1995 Dot.com era starts … 1995 Netscape goes public 1995 FAFNER: Factoring via Network-Enabled Recursion 1995 I-WAY (Information Wide Area Year) at SC95 1995 Globus project (ANL,UC,ISI) starts 1995 Java released by Sun 1997 Legion released 1997 UNICORE project starts 1997 Entropia founded 1998 Globus 1.0 released 1998 Legion commercial via Applied Metacomputing (becomes Avaki in 2001) 1999 First Grid Forum

Fernando Corbato • Designer of multics OS • Mainframe timesharing OS • Lead to UNIX • In 1965 envisaged a computer facility “like a power company or water company”

J. C. R. Licklider • Experimental psychologist • Envisaged a “grid” for scientific research • Contributed to development of ARPANET • 1968: Developed a vision of networked computers that would provide fast, automated support for human decision making

Len Kleinrock • Created the basic principles of packet switching, the technology underpinning the Internet, while a graduate student at MIT • His computer was the first node on the internet • Envisaged spread of computer utilities (1969)

“Grand Challenges” • Fundamental problems in science and engineering with broad economic and scientific impact. They are generally considered intractable without the use of state-of-the-art massively parallel computers • Used by funding agencies from the 80s onwards to motivate advances in science and high performance computing • Brought together distributed teams who started to collaborate around their machines, codes, data, etc

I-WAY: SC95 • High speed experimental distributed computing project. • Set up ATM network connecting supercomputers, mass storage, advanced viz devices at 17 US sites. • 30 software engineers, 60 applications, 10 networks (most OC-3c/155Mbps) • Application focused (remote viz, metacomputing, collaboration) • Single interface to schedule and start runs • I-POP machines (17) coordinated I-WAY “virtual machines”, gateways to the I-WAY • I-Soft software for management/programming

Aims of I-WAY • Develop network enabled tools and build collaborative environments on existing networks with differing protocols and properties • Locating and accessing distributed resources • Security and reliability • Use of distributed resources for computation • Uniform access to distributed data • Coupling distributed resources

I-WAY Infrastructure • I-POP: gateways to I-WAY • Dedicated point of present machines at each site • Uniformly configured with standard software environment • Accessible from internet, inside firewall • ATM interface for monitoring/management of ATM switch From Ian Taylor • I-Soft: management and application programming environment • Ran on I-POP machines • Provided uniform authentification, resource reservation, process creation, communication functions • CRB: Computational Resource Broker (central scheduler) • Security: Telnet client amended with Kerberos authentication and encryption • File system: AFS for shared repository • Communication: Nexus adapted (MPICH, CAVEcomm)

I-WAY New Concepts • Point of presence machines at each site • Computational resource broker integrates different local schedulers • Uniform authentication environment and trust relationships between sites • Network-aware parallel programming tools to provide uniform view and optimize communications • Led to Globus from ISI/ANL

DARPA, NSF, and DOE begin funding Grid work Globus Toolkit® History Does not include downloads from:NMI, UK eScience, EU Datagrid,IBM, Platform, etc. GT 2.0 Released GT 2.2 Released Physiology of the Grid Paper Released GT 2.0 beta Released NSF GRIDS CenterInitiated, DOE begins SciDAC program Anatomy of the Grid Paper Released Significant Commercial Interest in Grids GT 1.1.4 and MPICH-G2 Released The Grid: Blueprint for a New Computing Infrastructure published NSF & European Commission Initiate Many New Grid Projects First EuroGlobus Conference Held in Lecce GT 1.1.3 Released MPICH-G released Early Application Successes Reported GT 1.1.2 Released Globus Project wins Global Information Infrastructure Award GT 1.0.0 Released GT 1.1.1 Released NASA initiatesInformation Power Grid,DOE increases support 1998 2002 1997 1999 2000 2001 From Globus Team From Globus Team

Life sciences Computational biology, bioinformatics, genomics Access, collecting and mining data, imaging Engineering Aircraft design, modeling and monitoring Data High energy physics, astronomy Physical sciences Numerical relativity, material science, geoscience Collaborations Sharing, real time interactivity, visualization, communication Commercial Gaming, idle workstations, climate predication, disaster, cyber security, portals Education and distance learning Some Application Areas

Some Application Types • Minimal communication (embarrassingly parallel) • Staged/linked/workflow • Access to Resources • Fast throughput • Large scale • Adaptive • Real-time on demand • Speculative • We will read about these and new application scenarios later

What are Grids? • Provide: “coordinated resource sharing and problem solving in dynamic, multi-institutional, virtual organizations” • Grids link together people, computers, data, sensors, experimental equipment, visualization systems and networks (Virtual Organizations) • For example, they can provide • Sharing of computer resources • Pooling of information • Access to specialized equipment • Increased efficiency and on-demand computing • Enable distributed collaborations • Need to think about hardware, software, applications and policies.

Grid Checklist A Grid … • Coordinates resources that are not subject to centralized control • Uses standard, open, general purpose protocols and interfaces • Delivers non-trivial qualities of service Ian Foster, “What is the Grid? A Three Point Checklist”, 2002

Grid Resources Networks • High speed optical networks (e.g. NLR) • Academic networks: Internet2 • Commercial network providers • Wireless, bluetooth, 3G, etc. Visualization • Servers • Renderers • Access Grid • Eg CCT Imaginarium Computers • Any networked CPU • Supercomputers & Clusters • Workstations • Home PCs • PDAs • Telephones • Game machines • Very different properties: clock speed, memory, cache, FPUs, memory bandwidth, OS, software Data • Belonging to a single user or shared across a VO • Global distributed databases (e.g. NVO, Genome) • Storage devices • Security, access considerations Devices • Sensors • Telescopes • Gravitational wave detectors • Microscopes • Synchrotrons • Medical scanners • Etc

Characteristics • Different heterogeneous resources from different organizations • Mutually distrustful organizations • Differing security requirements and policies • Dynamic quality of service (machines, networks etc) • Heterogeneous networks • Capabilities: Dynamic, adaptive, autonomic, discovery

Who Will Use The Grid • Computational scientists and engineers • Experimental scientists • Collaborations • Educators • Enterprises • Governments • Health authorities • Use cases should be driving Grid developments, so important to understand needs and translate to requirements.

Computational Scientists and Engineers • Numerical simulation, access to more and larger computing resources • Easier, more efficient, access to supercomputers • Realtime visualization • Computational steering • Network enabled solvers • New scenarios

Experimental Scientists • Hook up supercomputers with instruments (telescopes, microscopes, …) • Advanced visualization and GUI interfaces • Remote control of instruments • Access to remote data • Management and use of large distributed data repositories

Governments • Disaster response • National defense • Long term research and planning • Collective power of nations fastest computers, data archives and intellect to solve problems • Strategic computing reserve (environmental disaster, earthquake, homeland security) • National collaboratory: complex scientific and engineering problems such as global environmental change, space station design

Virtual Organizations • “A number of mutually distrustful participants with varying degrees of prior relationship (perhaps none at all) who want to share resources in order to perform some task.” (Anatomy of the Grid” • Sharing involves direct access to remote software, computers, data and other resources. • Sharing relationships can vary over time, resources involved, nature of allowed access, participants who get access • Span small corporate departments to large groups of people from different organizations around the world • For example: • This class • The LSU numerical relativity group and its collaborators • Astronomical community who have access to virtual observatories

Virtual Organizations Three organizations and two VOs From “The Anatomy of the Grid”

Virtual Organizations • Vary in purpose, scope, size, duration, structure, community and sociology • Common requirements: • Highly flexibly sharing relationships (both client-server and peer-to-peer) • Sophisticated and precise levels of control over sharing • Delegation • Application of local and global policies • Address QoS, scheduling, co-allocation, accounting, …

How Will They Use It? • Distributed supercomputing • Aggregate computational resources for problems which can be solved on a single machine (e.g. all workstations in a company, all supercomputers in the world) • Large problems needing extreme memory, CPU, or other resource • E.g. astrophysics/numerical relativity: accurate simulations need fine scale detail • Challenges: latency, coscheduling, scalability, algorithms, performance

How Will They Use It? • High Throughput Computing • Large numbers of loosely coupled or independent tasks (e.g. leverage unused cycles) • On-Demand Computing • Short term requirements for jobs which cannot be effectively or conveniently run locally. • Often driven by cost-performance concerns • Challenges: dynamic requirements, large numbers of users and resources, security, payment

How Will They Use It? • Data Intensive Computing • Focus on generating new information from data in geographically distributed repositories, digital libraries, databases • E.g. High energy physics experiments generate terabytes of data/day, widely distributed collabotators; digital sky surveys • Challenges: scheduling and configuration of complex, high volume data flows

How Will They Use It? • Collaborative Computing • Enabling human-human interactions e.g. with shared resources such as data archives and simulations • Often in terms of a virtual shared space, e.g. a Cave environment • Challenges: realtime requirements

E-Science • Global collaborations for scientific research • “large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet” UK E-Science Program http://www.rcuk.ac.uk/escience/

Cyberinfrastructure • Software to support E-Science • “An infrastructure based on grids and on application-specific software, tools, and data repositories that support research in a particular discipline.” Getting Up To Speed, The Future of Supercomputing (2001) • GridChem project at CCT is building a cyberinfrastructure for computational chemists • UCOMS project at CCT is building a cyberinfrastructure for geoscientists • SCOOP project at CCT is building a cyberinfrastructure for coastal modellers • Looking for generic tools and techniques, driving research

SDSC S Brill Wave RZG SDSC LRZ S1 Calculate/Output Invariants S2 Archive data P1 Found a horizon, try out excision P2 Calculate/Output Grav. Waves Look for horizon S2 S1 Archive to LIGO public database Find best resources P2 P1 NCSA New Scenarios enabling new science Add more resources Queue time over, find new machine Free CPUs!! Clone job with steered parameter Physicist has new idea !

E-Science • Global collaborations for scientific research • “large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet” UK E-Science Program http://www.rcuk.ac.uk/escience/

Cyberinfrastructure • Software to support E-Science • “An infrastructure based on grids and on application-specific software, tools, and data repositories that support research in a particular discipline.” Getting Up To Speed, The Future of Supercomputing (2001) • GridChem project at CCT is building a cyberinfrastructure for computational chemists • UCOMS project at CCT is building a cyberinfrastructure for geoscientists • SCOOP project at CCT is building a cyberinfrastructure for coastal modellers • Looking for generic tools and techniques, driving research

SDSC S Brill Wave RZG SDSC LRZ S1 Calculate/Output Invariants S2 Archive data P1 Found a horizon, try out excision P2 Calculate/Output Grav. Waves Look for horizon S2 S1 Archive to LIGO public database Find best resources P2 P1 NCSA New Scenarios enabling new science Add more resources Queue time over, find new machine Free CPUs!! Clone job with steered parameter Physicist has new idea !

Examples

High Performance Computing A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. A main area of this discipline is developing parallel processing algorithms and software: programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors.

Numerical Relativity • Black holes, neutron stars, supernovae, gravitational waves • Governed by Einsteins Equations: very complex, need to solve numerically • 10 coupled mixed elliptic-hyperbolic PDEs, thousands of terms • High fidelity solutions need more research in numerics/physics … but also larger computers, better infrastructure • Physics currently limited by information technology!

Numerical Relativity • Good motivating example for Grid computing: • Large varied distributed collaborations • Need lots of cycles, storage (currently using teraflops, terabytes) • Need to share results, codes, parameter files, … • Need advanced visualization, steering

Finite difference method with “stencil width” 1 Parallelisation Proc 0

Split the data to be worked on across the processors you have available Each processor can then work on a different piece of data at the same time Parallelisation Proc 1 Proc 0

But there is a downside: data needs to be exchanged between processors most iterations: e.g. “synchronize”, “global reduction, output MPI (PVM, OpenMP, …) Parallelisation Proc 1 Proc 0

In this example just want to output fields from 2 processors, but it could be 2000 Each processor could write it’s own data to disk Then the data usually is moved to one place and “recombined” to produce a single coherent file Parallel IO Proc 1 Proc 0

Alternatively processor 0 can gather data from the other processors and write it all to disk Usually a combination of these works best … let every nth processor gather data and write to disk Parallel IO Proc 1 Proc 0

Large Scale Computing • PARALLEL: Typical runs they do now needs 45GB of memory: • 171 Grid Functions • 400x400x200 grid • OPTIMIZE: Typical run makes 3000 iterations with 6000 Flops per grid point: 600 TeraFlops !! • PARALLEL IO/VIZ/DATA: Output of just one Grid Function at just one time step • 256 MB • (320 GB for 10GF every 50 time steps) • CHECKPOINTING: One simulation takes longer than queue times: Need 10-50 hours • STEERING/MONITORING: Computing time is expensive • One simulation: 2500 to 12500 SUs • Need to make each simulation count

Numerical Relativity • Good motivating example for Grid computing: • Large varied distributed collaborations • Need lots of cycles, storage (currently using teraflops, terabytes) • Need to share results, codes, parameter files, … • Need advanced visualization, data management, steering • Connection to experimental equipment (LIGO Gravitational Wave Detector) and data.

Numerical Relativity • How do computational physicists work now? • Accounts on different machines: LSU, NCSA, NERSC, PSC, SDSC, LRZ, RZG, … • Learn how to use each machine • Compilers, filesystem, scheduler, MPI, policies, … • Ssh to machine, copy source code, compile, determine e.g. how much output can do in file system, how big a run should be, best queue to submit to, submit batch script • Wait till run starts, keep logging in to check if it is still running, what is happening … • Copy all data back to local machine for visualization and analysis • Email colleagues and explain what they saw. • Loose data, forget what they ran. • Publish paper

SDSC S Brill Wave RZG SDSC LRZ S1 Calculate/Output Invariants S2 Archive data P1 Found a horizon, try out excision P2 Calculate/Output Grav. Waves Look for horizon S2 S1 Archive to LIGO public database Find best resources P2 P1 NCSA New Scenarios Add more resources Queue time over, find new machine Free CPUs!! Clone job with steered parameter Physicist has new idea !

TeraGrid

TeraGrid: teragrid.org “Cyber-infrastructure” constructed through NSF TeraScale initiative • 2000: TeraScale Computing System (TCS-1) at PSC, resulting in a 6 TFLOPS computational resource. • 2001: $53M funding. Distributed Terascale Facility (DTF), 15 TFLOPS computational Grid composed of major resources at ANL, Caltech, NCSA, and SDSC. Exploits homogeneity at the microprocessor level, Intel Itanium architecture (Itanium2 and its successor) clusters to maximally leverage software and integration efforts. Homogeneity will offer the user community an initial set of large-scale resources with a high degree of compatibility, reducing effort required to move into the computational Grid environment. • 2002: $35M funding and PSC joins. Extensible TeraScale Facility (ETF), combines the TCS-1 and DTF resources into a single, 21+ TFLOPS Grid environment and supports extensibility to additional sites and heterogeneity. • 2003: $10M and four new sites: ORNL, Purdue, Indiana, TACC. 40 TFLOPS and 2 PBs. • 2005: $150M to enhance and operate TeraGrid: http://www.teragrid.org/news/news05/0817.html

Grid Computing 7700 Fall 2005 Lecture 2: About Grid Computing

Grid Computing 7700 Fall 2005 Lecture 2: About Grid Computing

Presentation Transcript

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing 2

Grid Computing

Sun Grid Engine Grid Computing Assignment – Fall 2005

Grid Computing

Grid Computing

Grid Computing 7700 Fall 2005 Course Details

Grid Computing

Grid Computing 7700 Fall 2005 Lecture 17: Resource Management

Grid Computing:

Grid Computing 7700 Fall 2005 Lecture 17: Resource Management

Grid Computing 7700 Fall 2005 Lecture 16: Grid Security

Grid Computing

Grid computing

Grid Computing 7700 Fall 2005 Lecture 10 and 12: Globus V2

Grid Computing

Grid Computing

Grid Computing