660 likes | 783 Vues
Services and Mashups Roy Williams California Institute of Technology. Agenda. VIM Portal for VO mashup Scaling services Asynchronous (batch) Security Advanced services AJAX SOAP. Making a portal for a command line application. The command-line application
E N D
Services and Mashups Roy WilliamsCalifornia Institute of Technology
Agenda • VIM • Portal for VO mashup • Scaling services • Asynchronous (batch) • Security • Advanced services • AJAX • SOAP
Making a portal for a command line application The command-line application $ mycode -apple 56 -banana 5346 -orange SDSS Make HTML form <center> <h4>Mycode Portal</h4> </center> Please fill in values<br/> <form method=GET action="http://localhost/cgi-bin/mycodeportal"> Apple: <input name="apple"><br/> Banana: <input name="banana"><br/> Orange: <select name="orange"><br/> <option value="SDSS">Sloan Digital Sky Survey DR5</option> <option value="2MASS">2MASS All-Sky Catalog</option> </select> <input type=submit value="Run Mycode"> </form>
Making a portal for a command line application (2) Make CGI wrapper import cgi form = cgi.FieldStorage() cmd = "mycode -apple %s -banana %s -orange %s" ] % (form["apple"], form["banana", form["orange"]) print "Content-type: text/plain\n" print "Command %s" % cmd pipe = os.popen(cmd) print "Stdout %s", pipe.read() print "Exit status %s", pipe.close()
More VOTable <VOTABLE version="v1.0"> <RESOURCE type="results"> <DESCRIPTION>Results from query to NASA/IPAC Extragalactic Database (NED) …. </DESCRIPTION> <TABLE ID="NED_MainTable" name="Searching NED within 0.3 arcmin of 178.542980, 10.796330"> <DESCRIPTION>Main information about object (Cone Search results)</DESCRIPTION> <PARAMucd="time.equinox" datatype="char" value="J2000.0" name="Equinox"/> <PARAM ucd="pos.system.coord" datatype="char" value="Equatorial" name="CoordSystem"/> <FIELD ucd="meta.id" datatype="int" ID="main_col1" name="No."> <DESCRIPTION>A sequential object number applicable to this list only.</DESCRIPTION> </FIELD> <FIELD ucd="meta.id;meta.main" datatype="char" arraysize="30" ID="main_col2" name="Object Name"> <DESCRIPTION>NED preferred name for the object</DESCRIPTION> </FIELD> <FIELD ucd="pos.eq.ra;meta.main"datatype="double" ID="main_col3" unit="degrees" name="pos_ra_equ"> <DESCRIPTION>Right Ascension in degrees (Equatorial J2000.0)</DESCRIPTION> </FIELD> <FIELD ucd="pos.eq.dec;meta.main" datatype="double" ID="main_col4" unit="degrees" name="pos_dec_equ"> <DESCRIPTION>Declination in degrees (Equatorial J2000.0)</DESCRIPTION> </FIELD> …….
NGC 4454 NGC 4454 NGC 4454 NGC 5376 NGC 5532 NGC 4500 NGC 4500 NGC 4500 NGC 2644 NGC 3978 NGC 4454 NGC 4454 NGC 4454 NGC 5376 NGC 5532 NGC 4500 NGC 4500 NGC 4500 NGC 2644 NGC 3978 NGC 4454 NGC 4454 NGC 4454 NGC 5376 NGC 5532 NGC 4500 NGC 4500 NGC 4500 NGC 2644 NGC 3978 VIM Resources catalog or other position-based dataset exposed by cone or skynode service example: SDSS example: Abell galaxy cluster catalog Customer provides Sources table 187.209, -1.938, NGC 4454 208.826, 59.506, NGC 5376 214.218, 10.807, NGC 5532 187.844, 57.964, NGC 4500 130.384, 4.971, NGC 2644 179.042, 60.522, NGC 3978 … … Multicone resources provide data tables sdss gsc2 twomass
Multicone User gets sources elsewhere Source = RA, Dec, ID Multicone =N sources + radius returns VOTable radius
upload sources HTML + JS Architecture All the relevant information about your sources -- mashups from the VO -- kept for you in a workbench in the cloud -- view, mine, download Catalogs Spectra Vim Customer personal persistent storage Nesssi Coregistered Image cutouts batch jobs
Sources and Matches • Start with a source table • RA, Dec, ID for each source sources RA Dec ID
Sources and Matches • Run VO services to get data • “Match” tables from each catalog • Multiple matches per source sources Cat1 Cat2 RA Dec ID
Join to sources • Closest or All • Joins match table to source table sources RA Dec ID Cat1 Cat2
Table column metadata Click to open/close Toggle column display
Table display Sources (user input) Three match tables Table with no columns displayed -- just the match count
Why Vim is best • WebServer or Laptop install • Mac and Linux have personal webserver • Scalable • Column operations only • Large operations can be Asynchronous (NESSSI) • Cannot select rows except by formula • Powered by Stilts (2,000,000,000 rows and up) • Open and Secure • Bench ID = random string • Share your workbench with your colleagues
Why Vim is best • Content • Any cone search (== all the main catalogs) • Cutouts from SIAP services • Co-registered to hyperatlas with Montage • Spectra via SSAP (from NRAO) • Thumbnails and images and FITS • Display • Column selection, Row sort/select • Images small-hover-large • Tools and metadata by hide-click-expose
Cutout images Hover mouse on cutout to see larger image
Spectra from SSAP Hover mouse on thumbnail to see larger image • Spectral Collections brokered by NRAO: • Arecibo Maser Catalog • 2dFGRS • SDSS DR5
Tools • Multicone • Fetch cone/siap/ssap for each source • Sort and Select • By any column value • Compute new column • Expressions (eg 2mass Jmag - SDSS Rmag) • Join • Closest or All combinations • Upload • From NESSSI service results • Caching • Of dynamic/remote data • Download • VOTable, CSV, KML, etc
Here They Are! • Jpeg is linked to FITS • Cutouts co-registered from different surveys
Asynchronous • Drop source list into Nesssi • Choose cutouts/cones • Leave to run over lunch
Asynchronous • Drop results as URLs uploaded to Vim
Usage • Install • You will need Python 2.x • You will need a webserver, personal or on a server • Read and edit the unpack.py script • Execute it with "python unpack.py" • Point to the URL • Try the links to the collections called: • seven galaxies, • 20 pulsars, or • 338 Arp galaxies. • Once loading is stopped, the tiny images respond to mouse hover with bigger images • Click on a Tool to open its form, click again to close it • Click on a Table to see its metadata and choose display, click again to close it • Use Multicone to get data from the VO • Upload sources • VOTable or CSV or VOTable-link
Crossmatch Join (= crossmatch)
Computing and Plotting Compute new column eg. Infrared-Optical color Download VOTable and plot with Topcat
Current Resource List • Resource = • URLformat + descriptions • URLformat = • “Generalized Cone Search” • Unification of{cone OR siap OR ssap OR others} • URL = URLformat % (lon,lat,radius) Example: http://nedwww.ipac.caltech.edu/cgi-bin/nph-objsearch? search_type=Near%%20Position%%20Search&of=xml_main& lon=%8.5f d&lat=%8.5f d&radius=%f
Vim scripting language These are the commands sent from client to server (future) users will get Python/Perl script that can reproduce session • multicone: act on the source list with a resource • view: change view and refresh display • Implemented with Astrogrid Stilts • addcol: compute new column from others • select: keep rows where criterion is true • join: join a table of matches to the source table • sort: sort on any column • Implemented with NVO VOTableLib • download: make an output product • cachelinks: copy images, dynamic links to cache • urltable: ingest external VOTable • upload: ingest text
Building Compute Services service container client • Developer and Admin • Services should be built by developers • In a framework managed by an adminstrator • Service developers must be careful • Services can be dangerous (eg “execute any command”) • Service users authenticated with “graduated security” • Easy to start, but great power is possible • Or just keep it anonymous • Asynchrony for compute intensive jobs • Jobs submitted to batch queue • Unique benchID may be used to monitor job & return results • From “clicking” to “scripting” • Services may be accessed by clicking on a web page or with scripted client codes • Authentication for web clicking comes from a certificate in browser • Scripted access requires a certificate services
Persistent Storage(“workbench”) Ceramics class meets each week for 8 weeks
Workbench • Persistent storage • Just a directory in the web space • Initiated by service • Tools operate on files in workbench • http://……?bench=39840422 & action=PCA & (other params)
Workbench • URL to workbench is obscure • htttp://localhost/cgi-bin/vim?benchID=16213077368925688004920409437160 • Can send to your colleague • Set up as • Read is free but URL is obscure • Using tools / write permission via password • Reaping • Maybe 30-day lifetime for workbench storage? • Need cron process to delete old benches
Keywords • “bench” • If present, specifies workbench • “action” • What should the server do? • Create workbench (provide password) • Upload data • Start algorithm • Monitor run (does the result exist?) • Download result • Others: • Depends on action, specifies detail
VIM server if actionkey == "init": benchID = bench.makeBench() elif form.has_key("bench"): benchID = form["bench"].value else: print "No bench specified -- exiting" # bench must be 32 decimal digits (NOT ../../precious) if re.match(r'^[0-9]{32,32}$', benchID, re.IGNORECASE) == None: print "Sorry, but %s does not look like a valid benchID name" % benchID sys.exit(1) bench.setBenchID(benchID) if actionkey == "urltable": actions.urltable(bench) if actionkey == "deletetable": actions.deletetable(bench) if actionkey == "fetch": actions.fetch(bench) if actionkey == "addcol": actions.addcol(bench) if actionkey == "select": actions.select(bench) if actionkey == "join": actions.join(bench) if actionkey == "sort": actions.sort(bench)
Making things easier • Let them log in! • Keeps record of workbenches • Who owns which • Users can ask for “my workbenches” • Can make log for funders • Who is doing what • BUT • Users *hate* to register at websites
Security and Certificates • Stop attacks • Access to secret data • Access to big resources • BUT • Lots of extra infrastructure • Users hate it
NESSSINVO Extensible Secure Scalable Service Infrastructure • Services are science-oriented • Services are made by trusted developers from the science community • Web forms OR command line (Python API) • Built-in security (X.509 certificates) • Very large jobs can be run • Easy to get a certificate • No complex install needed by client • Different levels of certificate get different service • Is installed on Teragrid • Services can be part of a workflow
Nesssi certificate policies node node queue client nesssi node certificate node Secure SOAP workbench storage cluster open http
Clarens server An open-source webserver based on OpenSSL.
A “Graduated Security” Model Power user Scripted access Portal-Based Big-iron computing.... Full Grid account - browser access More science.... Get NVO weak certificate - access logged, but identity not verified Some science.... Web form - anonymous access, small jobs
Traditional Grid Security client I will do exactly what you want. Show us your Certificate!
Graduated Security client May I have your Request and your Certificate?
Authentication with Certificates • A digital certificate proves who you are • X.509 • Usually encrypted by passphrase • Certificate as login • Map from certificate to account
Certificates The Virtual Observatory as a Virtual Organization This is a US driver’s licence. In the US it proves identity strongly. It is like a strong certificate. This is a loyalty card where I buy food. (You can put a false address on the application.) It is like a weak certificate.
How to be a Certificate Authority In order for an RA to validate the identity of a person, the subject should contact the RA face-to-faceand present photo-id and/or valid official documents showing that the subject is an acceptable end entity as defined in the CP/CPS document of the CA. In case of host or service certificate requests, the RA should validate the identity of the person in charge of the specific entities using a secure method. The RA should ensure that the requestor is appropriately authorized by the owner of the FQDN or the responsible administrator of the machine to use the FQDN identifiers asserted in the certificate.
Bench ID • Identify which job we are talking about • 32 character hex string eg cb28d0753a7fec9a485981f741d425ec • Used to monitor a running job sessionID = nesssiServer.cutout.init() msg = server.cutout.monitor(sessionID) • Used to form URL where results appear, eg • http://dtf-test1.sdsc.teragrid.org:8080/clarens/shell/cb/cb28d0753a7fec9a485981f741d425ec/cutouts/index.html • If you lose the sessionID, you lose your job
Monitoring a Nesssi job <NesssiMonitor> <Service>Cutout</Service> <Uname>ux400560</Uname> <SessionID>774daf5ef52facc68cb03db4b1fdc815</SessionID> <Sandbox>http://dtf-test1.sdsc.teragrid.org:8080/ clarens/shell/77/774daf5ef52facc68cb03db4b1fdc815</Sandbox> <Result>http://dtf-test1.sdsc.teragrid.org:8080/ clarens/shell/77/774daf5ef52facc68cb03db4b1fdc815/cutouts/index.html</Result> <QueueStatus>149.envoy.cacr.calte roy batch C8845cb 11516 1 -- -- 60:00 R -- </QueueStatus> </NesssiMonitor> service name running as this user session ID sandbox URL results URL queue status (R = running)
Example: SleepyAdd web portal nesssiServer=nesssi.client('https://dtf-test1.sdsc.teragrid.org:8443/clarens/',debug=0) sessionID = nesssiServer.sleepyadd.init() print "Your session ID is", sessionID # Run: sleep 30 seconds then add 52 and 344 nesssiServer.sleepyadd.run(sessionID, "-time 30 -n 52 -m 344") command line