1 / 67

Scientific Databases Lecture: Virtual Observatories for Space Science

Scientific Databases Lecture: Virtual Observatories for Space Science. Dr. Kirk Borne, GMU SCS November 18, 2003 GMU CSI 710. Outline. Quick Review of Astronomy Data The National Virtual Obseratory (NVO) Other Virtual Observatories for Space Science Why Virtual Observatories?

LionelDale
Télécharger la présentation

Scientific Databases Lecture: Virtual Observatories for Space Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Databases Lecture:Virtual Observatories for Space Science Dr. Kirk Borne, GMU SCS November 18, 2003 GMU CSI 710

  2. Outline • Quick Review of Astronomy Data • The National Virtual Obseratory (NVO) • Other Virtual Observatories for Space Science • Why Virtual Observatories? • NVO – It’s all about the Science: • IT-enabled, Science-enabling • The Enabling Computational Science Technologies for the NVO – where you can help! • Distributed Data Mining in the NVO Virtual Observatories for Space Science

  3. The Nature of Astronomical Data • Imaging • 2D map of the sky at multiple wavelengths • Derived catalogs • subsequent processing of images • extracting object parameters (400+ per object) • Spectroscopic follow-up • spectra: more detailed object properties • clues to physical state and formation history • lead to distances: 3D maps • Numerical simulations • All inter-related! Virtual Observatories for Space Science

  4. NOAO Deep Wide-Field Survey:http://www.noao.edu/noao/noaodeep/ Virtual Observatories for Space Science

  5. NOAO Deep Wide-Field Survey:http://www.noao.edu/noao/noaodeep/ Virtual Observatories for Space Science

  6. NOAO Deep Wide-Field Survey:http://www.noao.edu/noao/noaodeep/ Virtual Observatories for Space Science

  7. NASA Astronomy Mission Data:the tip of the data mountain NSSDC’s astrophysics data holdings: One of many science data collections for astronomy across the US and the world! NSSDC = National Space Science Data Center @ NASA/GSFC Virtual Observatories for Space Science http://nssdc.gsfc.nasa.gov/astro/astrolist.html

  8. “Quote of the day” • “It's just as unpleasant to get more than you expected as it is to get less.” • George Bernard Shaw Virtual Observatories for Space Science

  9. Why so many Telescopes? … Because … • Many great astronomical • discoveries have come • from inter-comparisons • of various wavelengths: • Quasars • Gamma-ray bursts • Ultraluminous IR galaxies • X-ray black-hole binaries • Radio galaxies • . . . Overlay Virtual Observatories for Space Science

  10. Therefore, our science data archive systems should enable multi-wavelength interdisciplinary distributed database access, discovery, mining, and analysis. Virtual Observatories for Space Science

  11. How does one integrate and use these distributed data archives? … Virtual Observatories for Space Science

  12. Emerging Computational Environment • Standardizing distributed data • Web Services, supported on all platforms • Custom configure remote data dynamically • XML: Extensible Markup Language • SOAP: Simple Object Access Protocol • WSDL: Web Services Description Language • UDDI: Universal Description, Discovery and Integration • Standardizing distributed computing • Grid Services • Custom configure remote computing dynamically • Build your own remote computer, use it, then discard it • Virtual Data: new data sets on demand Virtual Observatories for Space Science

  13. …The National Virtual Observatory (NVO) • National Academy of Sciences “Decadal Survey” recommended NVO as highest priority small (<$100M) project : “Several small initiatives recommended by the committee span both ground and space. The first among them—the National Virtual Observatory (NVO)—is the committee’s top priority among the small initiatives. The NVO will provide a “virtual sky” based on the enormous data sets being created now and the even larger ones proposed for the future. It will enable a new mode of research for professional astronomers and will provide to the public an unparalleled opportunity for education and discovery.” (p.14) Virtual Observatories for Space Science

  14. Why is it Virtual? • A Virtual Data System : • has multiple components • is (geographically) distributed • is interoperable • provides seamless user access to distributed data system components • provides “one-stop shopping” for data end-user Virtual Observatories for Space Science

  15. Why is it Necessary? • To maximize cross-enterprise multi-institutional resources • To minimize duplication of effort • To streamline operations through shared development • To serve multiple user communities • To facilitate simultaneous data mining, knowledge discovery, and information retrieval from multiple distributed data collections • Because data volumes are huge& growing rapidly ... For example, in Astronomy : • a few terabytes "yesterday” (10,000 CDROMs) • tens of terabytes "today” (100,000 CDROMs) • petabytes "tomorrow" (within 10 years) (100,000,000 CDROMs) Virtual Observatories for Space Science

  16. National Virtual Observatoryhttp://www.us-vo.org/ • NVO is a concept. It was recommended by the Astronomy Decadal Survey Committee to the National Academy of Sciences. Currently funded by NSF ($10M Information Technology Research grant); and NASA next year(?). • NVO is not just “National”. It is actually “Global”: http://www.ivoa.net/ • Will link geographically distributed astronomical data archives and information resources = provides “one-stop shopping” for data end-user • Will be heterogeneous, interoperable, and federated (autonomy maintained at local sites) … therefore, we are using XML and Web Services. • Requiresmiddleware standards for : metadata, resource descriptions (including the Dublin Core), queries, query results, the data (including the Data Model – see next slide), and semantics (… we are using Unified Content Descriptors = UCDs). • Requires innovative computational science technologies for : • data discovery, data mining, data fusion, distributed querying, and code-shipping (“Ship the code, not the data”) Virtual Observatories for Space Science

  17. Virtual Observatory Data Model A data model is the structure in which a computer program stores persistent information. Virtual Observatories for Space Science

  18. Virtual Observatories for Space Science

  19. VxO: becoming an operational system (high TRL) • What is aVxO? • Virtual “anything” Observatory – where “anything” currently includes Astrophysical, Solar, Magnetospheric, Heliospheric, Ionospheric, … • Summary statement for any VxO … Researchers should be able to find and access seamlessly all existing data relevant to the research they are considering, that data should be independently and correctly useable, and that data should be available in useful ways and in useful contexts. • Without exception, full VxO efforts aim in this direction by providing multi-mission data access and easy browse functionality. Virtual Observatories for Space Science

  20. (Trajectories) Capabilities of Space Physics Science Databases. The VxO Challenge: to Integrate Data, Tools, Services ModelsWeb http://spdf.gsfc.nasa.gov/ CDAWLib HelioWeb Science Data Facility Science User Support Acquisition & Ingest Tools & Services Virtual Observatories for Space Science

  21. How do Space Science Databases Change in a Future that has an Increasingly Rich/Robust VxO Framework? • One definition for this VxO framework could be … "The distributed implementation of an integrated space sciences data environment" • The broad goals of the data centers don't fundamentally change with this definition. • They still must enable new science by adding unique value to the Space Science research community through strong multi-discipline and cross-discipline data resources, with unique services tied to unique databases. • These services (data, functions, software) should (and will) be increasingly supplied as a key element of that new, broader VxO environment. • Logically, the data center’s services eventually become consumers as well as providers. • Visible early user impact of VxO is critical. • VxOs should develop a good long-term hybrid solution = PIs + missions/projects + Science Data Centers + (other) specialized services Virtual Observatories for Space Science

  22. Science Data Formats – part of the glue • Several key data formats are standard in space science: FITS (Astrophysics & Solar Physics), CDF and netCDF (Space Physics & Earth Science), HDF (Space Physics, Earth Science, & Computational Science). • Why? • These provide a baseline data format for all data sets in that discipline and in joint international projects. • They provide the base for many data center services, data analysis tools, data integration tools, visualization packages. • They are a key enabling technology for many different space missions and space science projects. • Plans: • Translation tools: from FITS <–> CDF <–> HDF <–> netCDF • Substantial work on format translators via XML and XSLT. Virtual Observatories for Space Science

  23. Interfaces to a VxO Environment • "Web Services" interface to existing data services • Web Services interfaces and software libraries complement existing FTP and interactive user web interfaces. • Web Services provides application-to-application interface, without human intervention. • Web Services provides distributed data registry (WSDL), data/resource discovery (UDDI), and data services (SOAP). • Scientific database services have unique scope and functionality that must be accessible by the VxO environment for it to gain user acceptance. • e.g., SOAP/XML interface for Space Physics data now enables 3-D interactive graphics of distributed multi-mission data. • Plans for data format translators and converters Virtual Observatories for Space Science

  24. Why Virtual Observatories? • Because: • The data are highly distributed. • Multi-mission data lead to new discoveries. • The data volumes are HUGE and growing. • And maybe because of Augustine’s Law … “Software is like entropy; it always increases.”- Norman Augustine Virtual Observatories for Space Science

  25. Szalay’s Law:The utility of N comparable datasets increases as N2 • Metcalf’s Law: The value of a network scales as n2, where n is the number of nodes connected. • Hagel & Armstrong’s Axiom: The aggregation of resources is more important than the amount of resources owned. • Metcalf’s law applies to telephones, the Internet … • Szalay argues as follows: • Each new dataset gives new information. • 2-way combinations give even more new information. Virtual Observatories for Space Science

  26. Size of a Typical Archived Astronomical Data Repository • Size of the archived data for an all-sky survey -- 40,000 square degrees is two Trillion pixels -- • One band 4 Terabytes • Multi-wavelength10-100 Terabytes • Time dimension10 Petabytes • LSST project (10 yrs) ~100 Petabytes@http://www.lsst.org/ All-sky distribution of 526,280,881 stars from the MACHO survey. Virtual Observatories for Space Science

  27. Ongoing Surveys of the Sky MACHO 2MASS DENIS SDSS GALEX FIRST DPOSS GSC-II COBE MAP NVSS FIRST ROSAT OGLE ... • Large number of new surveys • multi-TB in size, 100 million objects or more • individual archives planned, or under way • Multi-wavelength view of the sky • more than 13 wavelength coverage in 5 years • Impressive early discoveries • finding exotic objects by unusual colors • L,T dwarfs, high-z quasars • finding objects by time variability • gravitational microlensing Virtual Observatories for Space Science

  28. Sloan Digital Sky Survey Data Productshttp://www.sdss.org/ • Full Data Collection ~20 TB • Object catalog 400 GB parameters of >108 objects • Redshift Catalog 1 GB parameters of 106 objects • Atlas Images 1.5 TB 5 color cutouts of >108 objects • Spectra 60 GB in a one-dimensional form • Derived Catalogs 20 GB - clusters - QSO absorption lines • 4x4 Pixel All-Sky Map 60 GB heavily compressed Virtual Observatories for Space Science

  29. Large Synoptic Survey Telescope • Highly ranked in Decadal Review • Optimized for surveys • scan mode • deep mode • 7 square degree field • 6.9m effective aperture • 24th mag in 20 sec • > 20 Tbytes/night • Real-time analysis • “Celestial Cinematography” • Simultaneous multiple science goals Virtual Observatories for Space Science

  30. Large Mirror Fabrication(for large telescopes, such as LSST) That’s big! (Univ. of Arizona Mirror Laboratory) Virtual Observatories for Space Science

  31. NVO – It’s all about the Science Virtual Observatories for Space Science

  32. Science Discovery - the Old Way Virtual Observatories for Space Science

  33. Science Discovery - The New Way -Different! The discovery process will rely heavily on distributed data access and multi-archive data mining. Systematic data exploration • will have a central role • statistical analysis of the “typical” objects • automated search for the “rare” events Virtual Observatories for Space Science

  34. Conceptual Architecture for a Distributed Data Mining System User Analysis tools Discovery tools Gateway Data Archives Virtual Observatories for Space Science

  35. The Discovery Process Past:observations of small, carefully selected samples of objects in a narrow wavelength band discover significant patternsfrom the analysis of statistically rich and unbiased image/catalog databases understand complex astrophysical systems via confrontation between data and large numerical simulations Future: high quality, homogeneous multi-wavelength data on millions of objects, allowing us to The discovery process will rely heavily on advanced visualization, data mining, and statistical analysis tools. Virtual Observatories for Space Science

  36. The NVO in 5 words or less: “The archive is the sky!” Virtual Observatories for Space Science

  37. NVO: It is all about the Science • There is a huge scientific interest in the new data collections --large sky surveys, multiple telescopes, multiple-wavelength coverage of the sky, time domain coverage ... And it is all available on-line from your desktop … • “The archive is the sky!” • Something is needed to help scientists access, mine, and explore these huge data collections. • 1 Terabyte at 10 Mbyte/s takes 1 day to transmit • Hundreds of intensive queries and thousands of casual queries per-day • Data will reside at multiple locations, in many different formats • Existing analysis tools do not scale to Terabyte data sets • Acute need in a few years; solution will not just happen. Virtual Observatories for Space Science

  38. NVO Enables New Science http://www.us-vo.org/ • Rare and exotic objects • Very high redshift quasars • Dark matter in the galactic halo • Time-variable objects, transient events: distant supernovae and microlensing • Brown dwarfs • Variable stars • Asteroids... • ...incoming!! • Serendipity! Virtual Observatories for Space Science

  39. NVO Science Cases & Drivers(from Aspen 2001 NVO Workshop) • Solar System : NEOs, Long-Period Comets, TNOs, Killer Asteroids!!! • The Digital Galaxy : Find star streams and populations -- relics of past/present assembly phase. Identify components of disk, thick disk, bulge, halo, arms, ?? • The Low-Surface Brightness Universe : spatial filtering, multi-wavelength searches, intersection of the image and catalog domains • Panchromatic Census of AGN (Active Galactic Nuclei) : Complete sample of the AGN zoo, their emission mechanisms, and their environments • Precision Cosmology & Large-Scale Structure : Hierarchical Assembly History of Galaxies and Structure, Cosmological Parameters, Dark Matter and Galaxy Biasing as f(z) • Precision science of any kind that depends on very large sample sizes • "Survey Science Deluxe" • Search for rare and exotic objects (e.g., high-z QSOs, high-z Sne, L/T dwarfs) • Serendipity : Explore new domains of parameter space (e.g., time domain, or "color-color space" of all kinds) Virtual Observatories for Space Science

  40. Enabling Computational Science Technologies for the NVO Virtual Observatories for Space Science

  41. Major Functions of the NVO and the related Enabling Computational Science Technologies • To facilitate data mining and knowledge discovery within the very large astronomical databases -- Requires: • indexing for fast queries, filtering of large queries, data subsetting, visualization, parallelization (queries, access), ... • To facilitate linkages and cross-archive investigations -- Requires: • distributed computing, scalable architectures, load balancing, thin middleware layer, interoperability, code libraries, code-shipping, data-finding services, data standards & interchange formats, query/results protocols, data fusion, quality assessment, archive/metadata profiles, user profiles, intelligent agents, ... • To serve a broad community of users (professionals, amateur astronomers, schools, general public) -- • must support thousands of queries per day Virtual Observatories for Space Science

  42. Some General Challenges for NVO (and all Virtual Data Systems) • Data Discovery: Finding data within distributed data systems • Transparent User Access to Data: across heterogeneous environments • (Distributed) Data Mining and Analysis: of terabytes! • Interoperability: of systems, data, metadata, tools • New Technology Infusion: across multiple distributed systems • Sociology: "We don't need it" or "We already have it” Virtual Observatories for Space Science

  43. How do you get all of these distributed science databases working together? Virtual Observatory team motto: “It’s the middleware, stupid.” Virtual Observatories for Space Science

  44. National Virtual Observatoryhttp://www.us-vo.org/ • NVO is a concept. It was recommended by the Astronomy Decadal Survey Committee to the National Academy of Sciences. Currently funded by NSF ($10M Information Technology Research grant); and NASA next year(?). • NVO is not just “National”. It is actually “Global”: http://www.ivoa.net/ • Will link geographically distributed astronomical data archives and information resources = provides “one-stop shopping” for data end-user • Will be heterogeneous, interoperable, and federated (autonomy maintained at local sites) … therefore, we are using XML and Web Services. • Requiresmiddleware standards for : metadata, resource descriptions (including the Dublin Core), queries, query results, the data (including the Data Model – see next slide), and semantics (… we are using Unified Content Descriptors = UCDs). • Requires innovative computational science technologies for : • data discovery, data mining, data fusion, distributed querying, and code-shipping (“Ship the code, not the data”) Virtual Observatories for Space Science

  45. Tools for the NVO & other Virtual Data Systems • XML (eXtensible Markup Language) = "the language of interoperability"- ADC/XML Project was most comprehensive and advanced application of XML to NASA astrophysics data archives - including the XDF (eXtensible Data Format) and FITSML data standards [ http://xml.gsfc.nasa.gov/] • Comprehensive Data Mining Resource Guide for Large Scientific Databases - [follow the link at http://nvo.gsfc.nasa.gov/ ] • "The trouble with facts is that there are so many of them." - Samuel McChord Crothers, in "The Gentle Reader" • ISAIA (Interoperable Systems for Archival Information Access) : resource description profiles to enable access to distributed data providers • MOCHA (Middleware based On a Code-sHipping Architecture): middleware tools for search, retrieval, & data fusion from heterogeneous databases using heterogeneous interfaces - transparently federates distributed data access - • "Ship the code, not the data“ • The GRID! … Virtual Observatories for Space Science

  46. What is The Grid? • The GRID is“a distributed computing infrastructure that facilitates resource-sharing and coordinated problem-solving in dynamic, multi-institutional virtual organizations.” http://www.globus.org/datagrid/ http://www.gridforum.org/ http://www.nas.nasa.gov/About/IPG/ (NASA’s Information Power Grid) Virtual Observatories for Space Science

  47. The Grid: by Foster & Kesselman (Argonne National Laboratory) Internet computing and GRID technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries …. Transform scientific disciplines ranging from high energy physics to the life sciences Virtual Observatories for Space Science

  48. Data Grids vs. Computational Grids Virtual Observatories for Space Science

  49. Slide shown earlier:Conceptual Architecture for a Distributed Data Mining System User Analysis tools Discovery tools Gateway Data Archives Virtual Observatories for Space Science

  50. Compute node Compute node Compute node Compute node Compute node Compute layer200 CPUs Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node Other nodes Objectivity Objectivity Objectivity Objectivity Objectivity Objectivity RAID RAID RAID RAID RAID RAID RAID Interconnect layer 1 Gbits/sec/node Objectivity RAID Database layer 2 GBytes/sec A Concept for a Data Grid Nodefor Distributed Data Mining** Hardware requirements • Large distributed database engines • with few Gbyte/s aggregate I/O speed • High speed (>10 Gbit/s) backbones • cross-connecting the major archives • Scalable computing environment • with hundreds of CPUs for analysis HPC comes to the rescue! 10 Gbits/s ** Slide provided by Alex Szalay (JHU) Virtual Observatories for Space Science

More Related