350 likes | 375 Vues
Grids and Computational Science. ERDC Grid Tutorial August 17 2001 Geoffrey Fox IPCRES Laboratory for Community Grids Computer Science, Informatics, Physics Indiana University Bloomington IN gcf@indiana.edu. Abstract of PET and Computational Science Presentation.
E N D
Grids andComputational Science ERDC Grid TutorialAugust 17 2001 Geoffrey Fox IPCRES Laboratory for Community Grids Computer Science, Informatics, Physics Indiana University Bloomington IN gcf@indiana.edu erdccpsgrid01 gcf@indiana.edu
Abstract of PET andComputational Science Presentation • We describe HPCC and Grid trends and how they could be folded into a PET computational environment • A Peer to Peer Grid of Services supporting science and hence DoD swordfighter • We describe what works (MPI), what sort of works (Objects), what is known (parallel algorithms), what is active (datamining, visualization), what failed (good parallel environments), what is inevitable (petaflops), what is simple but important (XML), what is getting more complicated (applications) and the future (Web and Grids) erdccpsgrid01 gcf@indiana.edu
Trends of Importance • Resources of increasing performance • Computers, storage, sensors, networks • Applications of increasing sophistication • Size, multi-scales, multi-disciplines • New algorithms and mathematical techniques • Computer science • Compilers, Parallelism, Objects, Components • Grid and Internet Concepts and Technologies • Enabling new applications erdccpsgrid01 gcf@indiana.edu
Projected Top 500 Until Year 2009 • First, Tenth, 100th, 500th, SUM of all 500 Projected in Time Earth Simulator from Japan http://geofem.tokyo.rist.or.jp/ erdccpsgrid01 gcf@indiana.edu
Top 500 June 2001 erdccpsgrid01 gcf@indiana.edu
Top 500 by Vendor systems June 2001 erdccpsgrid01 gcf@indiana.edu
Top 500 by Vendor Total Power June 2001 erdccpsgrid01 gcf@indiana.edu
PACI 13.6 TF Linux TeraGrid OC-12 vBNS Abilene MREN OC-12 OC-3 = 32x 1GbE 32 quad-processor McKinley Servers (128p @ 4GF, 8GB memory/server) 574p IA-32 Chiba City 32 256p HP X-Class 32 Argonne 64 Nodes 1 TF 0.25 TB Memory 25 TB disk 32 32 Caltech 32 Nodes 0.5 TF 0.4 TB Memory 86 TB disk 128p Origin 24 32 128p HP V2500 32 HR Display & VR Facilities 24 8 8 5 5 92p IA-32 HPSS 24 HPSS OC-12 ESnet HSCC MREN/Abilene Starlight Extreme Black Diamond 4 Chicago & LA DTF Core Switch/Routers Cisco 65xx Catalyst Switch (256 Gb/s Crossbar) OC-48 Calren OC-48 OC-12 NTON GbE OC-12 ATM Juniper M160 NCSA 500 Nodes 8 TF, 4 TB Memory 240 TB disk SDSC 256 Nodes 4.1 TF, 2 TB Memory 225 TB disk Juniper M40 Juniper M40 OC-12 vBNS Abilene Calren ESnet OC-12 2 2 OC-12 OC-3 Myrinet Clos Spine 8 4 UniTree 8 HPSS 2 Sun Starcat Myrinet Clos Spine 4 1024p IA-32 320p IA-64 1176p IBM SP Blue Horizon 16 14 = 64x Myrinet 4 = 32x Myrinet 1500p Origin Sun E10K = 32x FibreChannel = 8x FibreChannel 10 GbE 32 quad-processor McKinley Servers (128p @ 4GF, 12GB memory/server) Fibre Channel Switch 16 quad-processor McKinley Servers (64p @ 4GF, 8GB memory/server) IA-32 nodes Cisco 6509 Catalyst Switch/Router erdccpsgrid01 gcf@indiana.edu
Caltech Hypercube JPL Mark II 1985 Chuck Seitz 1983 Hypercube as a cube erdccpsgrid01 gcf@indiana.edu
From the New York Times 1984 • One of today's fastest computers is the Cray 1, which can do 20 million to 80 million operations a second. But at $5 million, they are expensive and few scientists have the resources to tie one up for days or weeks to solve a problem. • ``Poor old Cray and Cyber (another super computer) don't have much of a chance of getting any significant increase in speed,'' Fox said. ``Our ultimate machines are expected to be at least 1,000 times faster than the current fastest computers.''(80 gigaflops predicted. Livermore just installed 12000 gflops) • But not everyone in the field is as impressed with Caltech's Cosmic Cube as its inventors are. The machine is nothing more nor less than 64 standard, off-the-shelf microprocessors wired together, not much different than the innards of 64 IBM personal computers working as a unit. • The Caltech Hypercube was “just a cluster of PC’s”! erdccpsgrid01 gcf@indiana.edu
From the New York Times 1984 • ``We are using the same technology used in PCs (personal computers) and Pacmans,'' Seitz said. The technology is an 8086 microprocessor capable of doing 1/20th of a million operations a second with 1/8th of a megabyte of primary storage. Sixty-four of them together will do 3 million operations a second with 8 megabytes of storage. • Computer scientists have known how to make such a computer for years but have thought it too pedestrian to bother with. • ``It could have been done many years ago,'' said Jack B. Dennis, a computer scientist at the Massachusetts Institute of Technology who is working on a more radical and ambitious approach to parallel processing than Seitz and Fox. • ``There's nothing particularly difficult about putting together 64 of these processors,'' he said. ``But many people don't see that sort of machine as on the path to a profitable result.'‘ • So clusters are a trivial architecture (1984) …… • So architecture is unchanged ;unfortunately after 20 years research,programming model is also the same (message passing) erdccpsgrid01 gcf@indiana.edu
Technology Trends and Principles • All performance and capability measures of infrastructure continue to improve • Gilder’s law says that network bandwidth increases 3 times faster than CPU Performance (Moore’s Law) • The Telecosm eclipses the Microcosm …. George Gilder Telecosm :How Infinite Bandwidth Will Revolutionize Our World (September 2000, Free Press; ISBN: 0684809303, #146(3883) in Amazon Sales Jan 15 2001(July 29 2001)) erdccpsgrid01 gcf@indiana.edu
Small Devices Increasing in Importance CM5 • There is growing interest in wireless portable displays in the confluence of cell phone and personal digital assistant markets • By 2005, 60 million internet ready cell phones sold each year • 65% of all Broadband Internet accesses via non desktop appliances erdccpsgrid01 gcf@indiana.edu
The HPCC Track • The 1990 HPCC 10 year initiative was largely aimed at enabling large scale simulations for a broad range of computational science and engineering problems • It was in many ways a success and we have methods and machines that can (begin to) tackle most 3D simulations • ASCI simulations particularly impressive • DoE still putting substantial resources into basic software and algorithms from adaptive meshes to PDE solver libraries • Machines are still increasing in performance exponentially and should achieve petaflops in next 7-10 years • Earthquake community needs to harness these capabilities • Japan’s Earth Simulator activity (GEOFEM) major effort erdccpsgrid01 gcf@indiana.edu
Some HPCC Difficulties • An Intellectual failure: we never produced a better programming model than message passing • HPCC code is hard work • “High point” of ASCI software is “Grid FTP” • An institutional problem: we do not have a way to produce complex sustainable software for a niche (1%) market like HPCC. • POOMA support just disappeared one day (foundation of first proposal GEM wrote) • One must adopt commodity standards and produce “small” sustainable modules. • Note distributed memory becoming dominant again with bizarre clustered SMP architecture – not clear that “wise” to exploit advantages of shared memory architectures erdccpsgrid01 gcf@indiana.edu
My HPCC Advice to HPCMO • KISS: Keep it Simple and Sustainable • Use MPI and openMP if needed for performance on shared memory nodes • Adaptive Meshes • Load Balancing • PDE Solvers including fast multipoles • Particle dynamics • Other areas such as datamining, visualization and data assimilation quite advanced but still significant research } Are well understoodto get high performanceparallel simulationsUse broad communityexpertise erdccpsgrid01 gcf@indiana.edu
Use of Object Technologies • The claimed commercial success in using Object and component technology has not been a clear success in HPCC • Object technologies do not naturally support either high performance or parallelism • C++ can be high performance but CORBA and Java are not • There is no agreed HPCC component architecture to produce more modern libraries (DoE has very large CCA – Common Component Architecture – effort) • Fortran will continue to decline in importance and interest – the community should prefer not to use it • It’s use will not attract the best students erdccpsgrid01 gcf@indiana.edu
Application Structure • New applications are typically multi-scale and multi-disciplinary • i.e. a given simulation is made of multiple components with either different time/length scales and/or multiple authors from possibly multiple fields • I am not aware of a systematic “Computational renormalization group” – a methodology that links different scales together • However composition of modules is an area where technology of growing sophistication is becoming available • Needed commercially to integrate corporate functions • CCA controversial “small grain size”; Gateway example of clearly successful large grain size integration Integration of data and simulationis one example of compositionwhich is “understood” erdccpsgrid01 gcf@indiana.edu
Object Size & Distributed/Parallel Simulations • All interesting systems consist of linked entities • Particles, grid points, people or groups thereof • Linkage translates into message passing • Cars on a freeway • Phone calls • Forces between particles • Amount of communication tends to be proportional to surface area of entity whereas simulation time proportional to volume • So communication/computation is surface/volume and decreases in importance as entity size increases • In parallel computing, communication synchronized; in distributed computing “self contained objects” (whole programs) which can be scheduled asynchronously erdccpsgrid01 gcf@indiana.edu
Complex System simulations • Networks of particles and (partial differential equation) grid points interact “instantaneously” and simulations reduce to iterating calculate/communicate phases:“calculate at given time or iteration number next positions/values” (massively parallel) and then update • Scaling parallelism guaranteed • Complex (phenomenological) systems are made of agents evolving with irregular time steps – event driven simulations do not parallelize This lack of global time synchronization in “complex systems” stops natural parallelism in classic HPCC approaches erdccpsgrid01 gcf@indiana.edu
Los Alamos Delphi Initiative • http://www.lanl.gov/delphi/index.shtml • Aims at large complex systems simulation of global and national scope in their size and significance • Demonstrates success of new methods (SDS – Sequential dynamical Systems) that parallelize well and outperform previous approaches • General applicability (e.g. to earthquakes) not clear • Could be relevant to cellular automata like models of earthquakes National traffic systems Epidemics Forest Fires Cellular and other communication networks e.g. the Internet Electrical, Gas, Water .. Grids Business processes Battles erdccpsgrid01 gcf@indiana.edu
Some Problem Classes • Hardest: smallish objects with irregular time synchronization (Delphi) • Classic HPCC: synchronized objects with regular time structure (communication overhead decreases as problem size increases) • Internet Technology and Commercial Application Integration: Large objects with modest communications and without difficult time synchronization • Compose as independent (pipelined) services • Includes some approaches to multi-disciplinary simulation linkage erdccpsgrid01 gcf@indiana.edu
What is a Grid or Web Service? • There are generic Grid system services: security, collaboration, persistent storage, universal access • An Application Service is a capability used either by another service or by a user • It has input and output ports – data is from sensors or other services • Portals are the user (web browser) interfaces to Grid services • Gateway makes running jobs on remote computers a Grid Service • It is invoked by other services e.g. the CFD service which includes Meshing Service, Human or other advice on code, Simulation and Visualization services erdccpsgrid01 gcf@indiana.edu
Sensors/Image Processing Service • Consider NASA Space Operations (CSOC) as a Grid Service • Spacecraft management (with a web front end) has planning, real-time control and decision making services • Each tracking station is a service which is a special case of the sensor service • All sensors have same top level structure as a Grid Service but are “specialized” (sub-classed) to each field • Image Processing is a pipeline of filters – which can be grouped into different services • These link to other filters, sensors, storage devices • Data storage is an important system service • Major services built hierarchically from “basic” services erdccpsgrid01 gcf@indiana.edu
Sensor Grid Service Distributed Sensor Service out portuniversal sensor accesspeople/computers in ports erdccpsgrid01 gcf@indiana.edu
Is a Grid Service a New Idea? • Not really for (in case of sensor) it is like the concept of a control computer to handle data from some device • BUT all control computers are “distributed objects”, web servers and all non binary data is defined in XML. • There is a universal way of discovering and defining services with universal input and output streams which can be defined in multiple protocols (IIOP(CORBA), RMI(Java), SOAP(Web)) • Further we have in portal a universal user interface • Further we have linked concepts of libraries (subroutine calls) and processes (linked by piping files) erdccpsgrid01 gcf@indiana.edu
MultidisciplinaryControl Integration of Grid Services Image ProcessingServer Parallel DBProxy Database SensorControl Grid GatewaySupportingSeamlessInterface DataMiningServer Origin 2000Proxy MPP NetSolveLinear Alg.Server Matrix Solver Agent-basedChoice ofCompute Engine IBM SPProxy Object Grid Programming Environment MPP Classic HPCC Resources erdccpsgrid01 gcf@indiana.edu
Overall Grid/Web Architecture Community Portals Science Portals & Workbenches P e r f o r m a n c e Conv e n i e n c e Next Generation Web Twenty-First Century Applications Commerce Grid Education Grid Access Grid Computational Grid Education Services Computational Services Access Services & Technology Business Services Grid Services (resource independent) Grid Fabric (resource dependent) Networking, Devices and Systems • General Vision? NCSA Vision erdccpsgrid01 gcf@indiana.edu
The Application Service Model • As bandwidth of communication (between) services increases one can support smaller services • A service “is a component” and is a replacement for a library in case where performance allows • Services are a sustainable model of software development – each service has documented capability with standards compliant interfaces • XML defines interfaces at several levels • WSDL at Grid level and XSIL or equivalent for scientific data format • A service can be written in Perl, Python, Java Servlet, Enterprise Javabean, CORBA (C++ or Fortran) ……. • Communication protocol can be RMI (Java), IIOP (CORBA) or SOAP (HTTP, XML) …… erdccpsgrid01 gcf@indiana.edu
Services support Communities • Grid Communities (HPCMO, PET, Vicksburg, Environmental Science, High School Classes) are groups of communicating individuals sharing resources implemented as Grid Services • Access Grid from Argonne/NCSA is best Audio/Video conferencing technology • Peer to Peer networking describes a set of technologies supporting community building with an emphasis on less structured groups than classic “users of a supercomputer” • Peer to peer Grids combine the technologies and support “small worlds” – optimized networks with short links between each community member erdccpsgrid01 gcf@indiana.edu
Classic Grid Architecture Database Database Resources Neos Composition Middle TierBrokers Service Providers Netsolve Security Portal Portal Typically separate Clients Servers Resources Clients Users and Devices erdccpsgrid01 gcf@indiana.edu
Peer to Peer Network User User User Service Service Service User User User Resource Resource Resource Service Service Service Routing Routing Routing Resource Resource Resource Routing Routing Routing Peers Peers are Jacks of all Trades linked to “all” peers in community Typically Integrated Clients Servers and Resources erdccpsgrid01 gcf@indiana.edu
Peer to Peer Grid User User User Service Service Service User User User Resource Resource Resource Service Service Service Routing Routing Routing Resource Resource Resource Routing Routing Routing GMS Routing Services DynamicMessage or EventRouting fromPeers orServers erdccpsgrid01 gcf@indiana.edu
HPCMO HPCC and Grid Strategy I • Decide what services are well enough understood and useful enough to be encapsulated as application services • Parallel FEM Solvers • Visualization • Parallel Particle Dynamics • Access to Sensor Data or GIS Data • Image Processing Filters • Make service as small as possible – smaller is simpler and more sustainable but with higher communication needs • Establish teams to design and build services • Use a framework offering needed Grid System services • Build HCMO electronic community with collaboration tools, resources and HPCMO wide networking erdccpsgrid01 gcf@indiana.edu
HPCMO HPCC and Grid Strategy II • Some capabilities – such as fast multipole or adaptive mesh package – should be built as classic libraries or templates • Other services – such as datamining or support of multi-scale simulations – need research using a toolkit approach if one can design a general structure • Need “hosts” for major services – access and storage of sensor data • Need funds to build and sustain “infrastructure” and research services • Use electronic community tools to enhance HCMO Collaboration erdccpsgrid01 gcf@indiana.edu