180 likes | 275 Vues
Discover the evolution from observational to computational science through the World Wide Telescope, enabling data exploration and predictions. Explore the data sources, literature, archives, and unified definitions of this groundbreaking tool.
E N D
Experience Building The World Wide Telescope aka: The Virtual Observatory Jim Gray Alex Szalay
The Evolution of Science • Observational Science • Scientist gathers data by direct observation • Scientist analyzes data • Analytical Science • Scientist builds analytical model • Makes predictions. • Computational Science • Simulate analytical model • Validate model and makes predictions • Data Exploration Science Data captured by instrumentsOr data generated by simulator • Processed by software • Placed in a database / files • Scientist analyzes database / files
Information Avalanche Image courtesy C. Meneveau & A. Szalay @ JHU • In science, industry, government,…. • better observational instruments and • and, better simulations producing a data avalanche • Examples • BaBar: Grows 1TB/day 2/3 simulation Information 1/3 observational Information • CERN: LHC will generate 1GB/s .~10 PB/y • VLBA (NRAO) generates 1GB/s today • Pixar: 100 TB/Movie • New emphasis on informatics: • Capturing, Organizing, Summarizing, Analyzing, Visualizing BaBar, Stanford P&E Gene Sequencer From http://www.genome.uci.edu/ Space Telescope
World Wide TelescopeVirtual Observatoryhttp://www.ivoa.net/ • Premise: Most data is (or could be online) • The Internet is the world’s best telescope: • It has data on every part of the sky • In every measured spectral band: optical, x-ray, radio.. • As deep as the best instruments (2 years ago). • It is up when you are up.The “seeing” is always great(no working at night, no clouds no moons no..). • It’s a smart telescope: links objects and data to literature on them.
The WWT Components • Data Sources • Literature • Archives • Unified Definitions • Units, • Semantics/Concepts/Metrics, Representations, • Provenance • Object model • Classes and methods • Portals
Data Sources • Literature online and cross indexed • Simbad, ADS, NED,http://simbad.u-strasbg.fr/Simbad, http://adswww.harvard.edu/, http://nedwww.ipac.caltech.edu/ • Many curated archives online • FIRST, DPOSS, 2MASS, USNO, IRAS, SDSS, VizeR,… • Typically files with English meta-data and some programs • Groups, Researchers, Amateurs Publish • Datasets online in various formats • Documentation varies • Publications are Ephemeral • Unknown provenance
Unified Definitions • Universal Content Definitions http://vizier.u-strasbg.fr/doc/UCD.htx • Collated all table heads from all the literature • 100,000 terms reduced to ~1,500 • Rough consensus that this is the right thing. • Refinement in progress as people use UCDs • Defines • Units: • gram, radian, second, ... • Semantic Concepts / Metrics • Std error, Chi2 fit, magnitude, flux @ passband, velocity,
Provenance • Most data will be derived. • To do science, need to trace derived data back to source. • So programs and inputs must be registered. • Must be able to re-run them. • Example: Space Telescope Calibrated Data • Run on demand • Can specify software version (to get old answers) • Scientific Data Provenance and Curation are largely unsolved problems (some ideas but no science).
Object Model • General acceptance of XML • Recent acceptance of XML Schema (XSD over DTD) • Wait-and-See about SOAP/WSDL/… • “ Web Services are just Corba with angle brackets.” • FTP is good enough for me. • Personal opinion: • Web Services are much more than “Corba + <>” • Huge focus on interop • Huge focus on integrated tools • But the community says “Show me!” • Many technologists sold, but not the astronomers
Classes and Methods • First Class: VO tablehttp://www.us-vo.org/VOTable/VOTable-1-0.htm • Represents an answer set in XML • Defined by an XML Schema (XSD) • Metadata (in terms of UCDs) • Data representation(numbers and text) • First method • Cone Search: Get objects in this cone
Other Classes • Space-Time class • http://hea-www.harvard.edu/~arots/nvometa/STCdoc.pdf • Image Class (returns pixels) • SdssCutout • Simple Image Access Protocol http://bill.cacr.caltech.edu/cfdocs/usvo-pubs/files/ACF8DE.pdf • HyperAtlashttp://bill.cacr.caltech.edu/usvo-pubs/files/hyperatlas.pdf • Spectral • Simple Spectral Access Protocol • 500K spectra available athttp://voservices.net/wave • Query Services • ADQL and SkyNode http://skyservice.pha.jhu.edu/develop/vo/adql/ • Registry: • see below
The Registry • UDDI seemed inappropriate • Complex • Irrelevant questions • Relevant questions missing • Evolved Dublin Core • Represent Datasets, Services, Portals • Needs to be machine readable • Federation (DNS model) • Push & Pull: register then harvest • http://www.ivoa.net/twiki/bin/view/IVOA/IvoaResReg
SkyQueryA Prototype WWT • Started with SDSS data and schema • Imported about 9 other datasets into that spine schema. • Unified them with a portal • Implicit spatial join among the datasets. • All built on Web Services • Pure XML • Pure SOAP • Used .NET toolkit
Demo • SkyServer: • navigator showing cutout web service • List: showing many calls and variant use. • SkyQuery: • Show integration of various archives. • Explain spatial join xMatch operator.
MyDB • Portal allows federation of data but… • Intermediate results may be large. • Intermediate results feed into next analysis step. • Sending them back-and-forth to client is costly and sometimes infeasible. • Solution: create a working DB for client at Portal: MyDB
MyDB • Anyone can create a personal DB at SkyServer portal. • It is about 100 MB • It is private • Simple queries done immediately • Complex queries done by batch scheduler • All queries can create/read/write MyDB tables • Very popular with “serious” users. • MyDB will be sharable with by a group.
Open SkyQuery • SkyQuery being adopted by AstroGrid as reference implementation for OGSA-DAI(Open Grid Services Architecture, Data Access and Integration). • SkyNode basic archive objecthttp://www.ivoa.net/twiki/bin/view/IVOA/SkyNode • SkyQuery Language (VoQL) is evolving.http://www.ivoa.net/twiki/bin/view/IVOA/IvoaVOQL
The WWT Components What we learned • Astro is a community of 10,000 • Homogenous & Cooperative • If you can’t do it for Astro, do not bother with 3M bio-info. • Agreement • Takes time • Takes endless meetings • Big problems are non-technical • Legacy is a big problem. • Plumbing and tools are thereBut… • What is the object model • What do you want to save. • How document provenance. Outline • Data Sources • Literature • Archives • Unified Definitions • Units, • Semantics/Concepts/Metrics, Representations, • Provenance • Object model • Classes and methods • Portals • WWT is a poster child for the Data Grid.