AstroGrid Datacenters
220 likes | 309 Vues
Learn about the challenges faced in managing large datasets in the AstroGrid Consortium, along with their approach, developed solutions like Storepoints, and the status of the project.
AstroGrid Datacenters
E N D
Presentation Transcript
AstroGrid Datacenters AstroGrid Consortium Review Dec 2004 Martin Hill (AstroGrid@ROE)
Outline • Challenge • Approach • Developed: • Storepoints • Describing data • Query Language • Status • Versioning • Software: Publisher’s AstroGrid Library
Problem Challenge Outline • Large datasets (to Petabytes) • So? • Distributed; Science comes from combining • Bandwidth rising slower than • No/few established suitable standards • FITS images/‘tables’. Ambiguous headers. Ambiguous subformat, eg spectra. • VOTable introduced. Ambiguous subformat eg spectra vs catalogue. Verbose. • No/few established common terms • Involves Scientists…
Approach: ‘Publisher’s AstroGrid Library’ • General solution to: • Discover problems faced, accumulate solutions in software • Experimentally publish sets and types (not host). • Many smaller datasets owned by people without web skills (eg solar) so: • Need 'easy‘/’unskilled’ installation • Able to proxy; 3rd parties can publish data without requiring more work from owner (eg VizieR, Trace) • ‘Free’ website, range of standard interfaces • Danger: too general (any query against any dataset producing any results).
Existing Solutions • Common task: publish RDBMs to web • Accumulated tools & skill-sets • No combined solution offering: • Standard interface (eg query language) • Scientific values (errors, units) • Spatial querying (common) • VO Metadata for query and results
Developing Standards • Resource metadata • Query language (ADQL/s, ADQL/x) • Web interfaces • Working beyond standards • Feeding research to IVOA • Parallel development • In the VO: eg Starlink, NVO, VizieR • External: SRB, Taverna, GridPP monitor • Convergence
Protocols & Interfaces • Human – web pages • SOAP • Toolkit Incompatibilities • Streaming awkward (via Toolkits) • Longer term benefits? • ‘Raw Http post’ (eg servlets, CGI) • Simpler • More existing skills amongst Astronomers • Mixed (eg SIAP, SkyNode) • Don’t Choose – Implement • Mix & Match, Plug & Play:
Releasing • Deploy early – if temporarily • Independent & Integrated Access • Versioning: • Servers & clients, ie new clients can still use old servers, and new servers work with old clients. • Add and ‘deprecate’, don’t change • Delete intelligently • (Remove quickly unused i/fs, eg CEA if CEA upgrades, JSPs) • Need hosts… • Hosts need hardware • Publishers need to know their data
Describing Data • Registry ‘Resource’ documents • IVO Tabular Sky Service • Units, UCDs • Solar vs Sky vs… • Images vs Catalogues • Concept extended for ‘RdmsMetadata’ • UCD1+ -> Dictionaries & Ontologies • Relationships (simple: errors) • Queryable • Mirrors vs Copies
Query Language • SQL -> ADQL/xml • Defined common functions – CIRCLE & XMATCH (sky not solar) • Working on: • XQL • Units • Investigating: UCDs instead of columns • Cross-dataset querying
Results • Query+Metadata+RawResults = VoResults • FITS vs VOTable vs HDF vs CSV vs HTML vs… • All of them • Results -> queryable data -> inputs
Data Analysis (Clive Page) • Faster feasible • < 10^6s OK. 10^8 not… • Joins • Polar coordinate matches (+ HTM, HealPix). • Cross-match algorithms • Distributed queries • Breaking down query • Moving the right data • Combining the results
Status • Readily available • Debugging; developer • Debugging; astronomer • Inform User
Storepoints • No data persistence at PALs • Web server machines not data storage ones • Large result sets • No workspace, memory models, etc • Streaming outputs • SRB, GridFTP not ready.
Identifying Storepoints • Concepts MySpace Community HomeSpace SRB FTP FTP VoSpace (Registered) SRB GridFTP MySpace SRB GridFTP HTTP • FTP, File, MySpace + extend. • 3rd iteration; 2nd in use
Data Service Architecture JSP SIAP CEA Axis AstroGrid SkyNode Plugin Manager Cone Datacenter Implementation Slinger /XML/CSV zip/plain email/file/ftp/myspace
Publishers’ AstroGrid Library • ‘Easy to publish to the VO’ • Web Application, includes: • SOAP (AstroGrid, CEA, prepped for SkyNode) • CGI (SIAP, NVO-cone search, SSA) • HTML pages (cone search, query builder, status monitor) • Features • Asynchronous (‘stateful’) & Synchronous Queries • Queues • Comprehensive Status (incl historical) • Variety results • Fully ‘Streamed’ – no curation issues • Server ‘Plugins’, including: • RDBMS (JDBC) • FITS file collection • eXist (XML) • Helper Tools • Metadata Generators • Ready-made website access
Situation Now • Installed: • SuperCOSMOS Science Archive (RDBMS) • astrogrid.roe.ac.uk:8080/pal-ssa/ • astrogrid.roe.ac.uk:8080/pal-twomass/ • astrogrid.roe.ac.uk:8080/pal-usnob/ • 6dF – Spectra • grendel12.roe.ac.uk:8080/pal-6df/ • Wide Field Survey • TRACE (FITS files, Solar, under test) • Proxy (bespoke special plugins) • All NVO-cone-compatible DBs (test) • VizieR • Evaluated/ing at: • ESO • RAL (solar) • JBO (Merlin) • Reviewing Query Language, metadata documents, etc
Future • Quality… • Metadata ‘wizards’ • Sell to hosts; deploy to Leicester, JBO, ESO, RAL, The World.... • Explicit and Investigative Queries • Distributed queries & combining results (NVO Exec plans) • Full SIA, SSA interface • More user & admin web pages • Local authorisation