560 likes | 681 Vues
Using the Grid for Astronomy Roy Williams, Caltech. Enzo Case Study. Simulated dark matter density in early universe N-body gravitational dynamics (particle-mesh method) Hydrodynamics with PPM and ZEUS finite-difference Up to 9 species of H and He Radiative cooling
E N D
Enzo Case Study • Simulated dark matter density in early universe • N-body gravitational dynamics (particle-mesh method) • Hydrodynamics with PPM and ZEUS finite-difference • Up to 9 species of H and He • Radiative cooling • Uniform UV background (Haardt & Madau) • Star formation and feedback • Metallicity fields
Adaptive Mesh Refinement (AMR) • multilevel grid hierarchy • automatic, adaptive, recursive • no limits on depth,complexity of grids • C++/F77 • Bryan & Norman (1998) Source: J. Shalf
Distributed Computing Zoo • Grid Computing • Also called High-Performance Computing • Big clusters, Big data, Big pipes, Big centers • Globus backbone, which now includes Services and Gateways • Decentralized control • Cluster Computing • local interconnect between identical cpu’s • Peer-to-Peer (Napster, Kazaa) • Systems for sharing data without centeral server • Internet Computing • Screensaver cycle scavenging • eg SETI@home, Einstein@home, ClimatePrediction.net, etc • Access Grid • A videoconferencing system • Globus • A popular software package to federate resources into a grid • TeraGrid • A $150M award from NSF to the Supercomputer centers (NCSA, SCSC, PSC, etc etc)
What is the Grid? • The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations • In contrast, the Grid is an emerging infrastructure that provides seamless access to computing power and data storage capacity distributed over the globe.
What is the Grid? • “Grid” was coined by Ian Foster and Carl Kesselman “The Grid: blueprint for a new computing infrastructure”. • Analogy with the electric power grid: plug-in to computing power without worrying where it comes from, like a toaster. • The idea has been around under other names for a while (distributed computing, metacomputing,…). • Technology is in place to realise the dream on a global scale.
What is Middleware? • The GRID middleware: • Finds convenient places for the scientists “job” (computing task) to be run • Optimises use of the widely dispersed resources • Organises efficient access to scientific data • Deals with authentication to the different sites • Interfaces to local site authorisation / resource allocation • Runs the jobs • Monitors progress • Recovers from problems • … and …. • Tells you when the work is complete and transfers the result back!
Grid as Federation • Grid as a federation • independent centers flexibility • unified interface • power and strength • Large/small state compromise
Three Big Ideas of Grid • Federation and Uniformity • independent management; uniform face; open standards • Trust and Security • access policy; uniform authentication/authorization • Distance doesn’t matter • 20 Mbyte/sec, global file system
Grid projects in the world • DOE Science Grid • NSF National Virtual Observatory • NSF GriPhyN/iVDGL • DOE Particle Physics Data Grid • NSF TeraGrid • DOE Earth Systems Grid • NEESGrid • DOH BIRN • UK e-Science Grid • EUROGRID • DataGrid (CERN, ...) • EuroGrid (Unicore) • DataTag (CERN,…) • GridLab (Cactus Toolkit) • CrossGrid (Infrastructure Components)
TeraGrid Components • Compute hardware • Intel/Linux Clusters, Alpha SMP clusters, POWER4 cluster, … • Large-scale storage systems • hundreds of terabytes for secondary storage • Very high-speed network backbone • bandwidth for rich interaction and tight coupling • Grid middleware • Globus, data management, … • Next-generation applications
The TeraGrid VisionDistributing the resources is better than putting them at one site • Build new, extensible, grid-based infrastructure • New hardware, new networks, new software, new practices, new policies • Leverage homogeneity • Run single job across entire TeraGrid • Move executables between sites • Catch-phrase: Open, Deep and Wide • Open to US science community • Heroic computing possible by programming Unix • Easy to use through science gateways
TeraGrid Allocations Policies • Any US researcher can request an allocation • http://www.teragrid.org
Wide Variety of Usage Scenarios • Tightly coupled simulation jobs storing vast amounts of data, performing visualization remotely as well as making data available through online collections (ENZO) • Thousands of independent jobs using data from a distributed data collection (NVO) • Science Gateways – "not a Unix prompt"! • from web browser with security • SOAP client for scripting • from application eg IRAF, IDL
Account Security • Username/Password • weak security, too many holes • deprecated in many places • SSH keys • put public key on remote machine • serves as single sign-on • X.509 Certificates • Proves identity • Flexible
Ways to Submit a Job 1. Directly to PBS Batch Scheduler • Simple, scripts are portable among PBS TeraGrid clusters 2. Globus common batch script syntax • Scripts are portable among other grids using Globus 3. Condor-G = Condor + Globus 4. Use a science gateway, eg Nesssi specific tasks, easy to use
PBS Batch Submission • Single executables to be on a single remote machine • login to a head node, submit to queue • Direct, interactive execution • mpirun –np 16 ./a.out • Through a batch job manager • qsub my_script • where my_script describes executable location, runtime duration, redirection of stdout/err, mpirun specification… • ssh tg-login.sdsc.teragrid.org • qsub flatten.sh –v "FILE=f544" • qstat or showq • ls *.dat • pbs.out, pbs.err files
Remote submission • Through globus • globusrun -r [some-teragrid-head-node].teragrid.org/jobmanager -f my_rsl_script • where my_rsl_script describes the same details as in the qsub my_script! • Through Condor-G • condor_submit my_condor_script • where my_condor_script describes the same details as the globus my_rsl_script!
Condor-G A Grid-enabled version of Condor that provides robust job management for Globus clients. • Robust replacement for globusrun • Provides extensive fault-tolerance • Can provide scheduling across multiple Globus sites • Brings Condor’s job management features to Globus jobs
Condor DAGMan • Manages workflow interdependencies • Each task is a Condor description file • A DAG file controls the order in which the Condor files are run
Cluster Supercomputer job submission and queueing (Condor, PBS, ..) login node 100s of nodes user purged /scratch parallel I/O parallel file system /home (backed-up) global file system metadata node
MPI parallel programming • Each node runs same program • first finds its number (“rank”) • and the number of coordinating nodes (“size”) • Laplace solver example Algorithm: Each value becomes average of neighbor values node 0 node 1 Serial: for each point, compute average remember boundary conditions Parallel: Run algorithm with ghost points Use messages to exchange ghost points
Globus • Security • Single-sign-on, certificate handling, CAS, MyProxy • Execution Management • Remote jobs: GRAM and Condor-G • Data Management • GridFTP, reliable FT, 3rd party FT • Information Services • aggregating information from federated grid resources • Common Runtime Components • web services through GT4 • The following is a personal opinion, • it is NOT the position of the NVO: • Globus is a complex and difficult installation • Globus needs frequent maintenance and updates • Globus is monolithic (all or nothing)
Disk Farms (datawulf) • Homogeneous Disk Farm • (= parallel file system) parallel I/O metadata node parallel file system Large files striped over disks Management node for file creation, access, ls, etc etc
Parallel File System • Large files are striped • very fast parallel access • Medium files are distributed • Stripes do not all start the same place • Small files choke the PFS manager • Either containerize • or use blobs in a database • not a file system anymore: pool of 108 blobs with lnames
Storage Resource Broker (SRB) • Single logical namespace while accessing distributed archival storage resources • Effectively infinite storage • Data replication • Parallel Transfers • Interfaces: command-line, API, SOAP, web/portal.
hpss-sdsc sfs-tape-sdsc hpss-caltech workstation Storage Resource Broker (SRB):Virtual Resources, Replication NCSA SDSC SRB Client (cmdline, or API) …
Storage Resource Broker (SRB):Virtual Resources, Replication Similar to VOSpace concept certificate casjobs at JHU Browser SOAP client Command-line .... tape at sdsc File may be replicated File comes with metadata ... may be customized myDisk
Containerizing • Shared metadata • Easier for bulk movement file in container container
Data intensive computing with NVO services
Two Key Ideas for Fault-Tolerance • Transactions • No partial completion -- either all or nothing • eg copy to a tmp filename, then mv to correct file name • Idempotent • “Acting as if done only once, even if used multiple times” • Can run the script repeatedly until finished
DPOSS flattening Source Target 2650 x 1.1 Gbyte files Cropping borders Quadratic fit and subtract Virtual data
Driving the Queues for f in os.listdir(inputDirectory): # if the file exists, with the right size and age, then we keep it ofile = outputDirectory +"/"+ f if os.path.exists(ofile): osize = os.path.getsize(ofile) if osize != 1109404800: print " -- wrong target size, remaking", osize else: time_tgt = filetime(ofile) time_src = filetime(file) if time_tgt < time_src: print(" -- target too old or nonexistant, making") else: print " -- already have target file " continue cmd = "qsub flat.sh -v \"FILE=" + f +"\"" print " -- submitting batch job: ", cmd os.system(cmd) • Here is the driver that makes and submits jobs
PBS script • A PBS script. Can do "qsub script.sh –v "FILE=f345" #!/bin/sh #PBS -N dposs #PBS -V #PBS -l nodes=1 #PBS -l walltime=1:00:00 cd /home/roy/dposs-flat/flat ./flat \ -infile /pvfs/mydata/source/${FILE}.fits \ -outfile /pvfs/mydata/target/${FILE}.fits \ -chop 0 0 1500 23552 \ -chop 0 0 23552 1500 \ -chop 0 22052 23552 23552 \ -chop 22052 0 23552 23552 \ -chop 18052 0 23552 4000
GET services from Python • This code uses a service to find the best hyperatlas page for a given sky location import urllib hyperatlasURL = self.hyperatlasServer + "/getChart?atlas=" + atlas \ + "&RA=" + str(center1) + "&Dec=" + str(center2) stream = urllib.urlopen(hyperatlasURL) # result is a tab-separated line, so use split() to tokenize tokens = stream.readline().split('\t') print "Using page ", tokens[0], " of atlas ", atlas self.scale = float(tokens[1]) self.CTYPE1 = tokens[2] self.CTYPE2 = tokens[3] rval1 = float(tokens[4]) rval2 = float(tokens[5])
VOTable parser in Python • From a SIAP URL, we get the XML, and extract the columns that have the image references, image format, and image RA/Dec import urllib import xml.dom.minidom stream = urllib.urlopen(SIAP_URL) doc = xml.dom.minidom.parse(stream) #Make a dictionary for the columns col_ucd_dict = {} for XML_TABLE in doc.getElementsByTagName("TABLE"): for XML_FIELD in XML_TABLE.getElementsByTagName("FIELD"): col_ucd = XML_FIELD.getAttribute("ucd") col_ucd_dict[col_title] = col_counter urlColumn = col_ucd_dict["VOX:Image_AccessReference"] formatColumn = col_ucd_dict["VOX:Image_Format"] raColumn = col_ucd_dict["POS_EQ_RA_MAIN"] deColumn = col_ucd_dict["POS_EQ_DEC_MAIN"]
VOTable parser in Python • Table is a list of rows, and each row is a list of table cells import xml.dom.minidom table=[] for XML_TABLE in doc.getElementsByTagName("TABLE"): for XML_DATA in XML_TABLE.getElementsByTagName("DATA"): for XML_TABLEDATA in XML_DATA.getElementsByTagName("TABLEDATA"): for XML_TR in XML_TABLEDATA.getElementsByTagName("TR"): row=[] for XML_TD in XML_TR.getElementsByTagName("TD"): data = "" for child in XML_TD.childNodes: data += child.data row.append(data) table.append(row)
Grid Impediments and now do some science.... Learn Globus Learn MPI Learn PBS Port code to Itanium Get certificate Get logged in Wait 3 months for account Write proposal
A better way:Graduated Securityfor Science Gateways power user Write proposal - own account big-ironcomputing.... Authenticate X.509 - browser or cmd line morescience.... Register - logging and reporting somescience.... Web form - anonymous
Three Types of Science Gateways • Web-based Portals • User interacts with community-deployed web interface. • Runs community-deployed codes • Service requests forwarded to grid resources • Scripted service call • User writes code to submit and monitor jobs • Grid-enabled applications • Application programs on users' machines (eg IRAF) • Also runs program on grid resource Nesssi Nesssi
Nesssi: Secure Web services for astronimy certificate repository certificate policies node select user account fetch proxy node SOAP http web form nesssi web portal queue client nesssi node node sandbox storage open http
Mosaic service nesssiServer. dpossMosaic.mosaic ( “-ra 49.1 -dec 60.1 -rawidth 0.5 -decwidth 0.5 -filt f -bgcorr 0”)
Coadd service nesssiServer.hyperatlas.run ( “-bandpass z1 -ra 170.08 -dec 13.275 -rawidth 1.0 -decwidth 1.0 “)