1 / 39

Data Services Required for Future Magnetospheric Research

Data Services Required for Future Magnetospheric Research. Robert L. McPherron and R. Walker and Todd King IGPP @ UCLA Invited talk presented at Spring AGU 2006. Locate a data provider Obtain permission to use data Learn graphic user interface Obtain the data inventory Obtain metadata

lorie
Télécharger la présentation

Data Services Required for Future Magnetospheric Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Services Required for Future Magnetospheric Research Robert L. McPherron and R. Walker and Todd King IGPP @ UCLA Invited talk presented atSpring AGU 2006

  2. Locate a data provider Obtain permission to use data Learn graphic user interface Obtain the data inventory Obtain metadata Identify the dataset Select the variables Select the time interval Transfer the data file to user Build a directory structure Move files into structure Rename the files Remove extraneous records Insert missing records Insert missing files Convert flag to local flag Convert time fields Write a binary file Steps in Acquiring Digital Data Data Provider User Computer About 20 steps in this process for a single type of data!

  3. What Services are Needed? • Most researchers want data - not complex analysis or plots • The primary service needed is simple access to digital data delivered in a convenient package • Only after this problem is solved should we consider specialized processing of the data between the archive and the user

  4. Multiple sources Multiple interfaces Multiple forms Complex formats Difficult transformations Restricted access Restricted intervals Single file transfers Missing records in a file Missing files in hierarchy Inconsistent file formats Static browse plots What are Some Problems Today? • Identify existing problems • Design a system that removes these problems and provides opportunity for future expansion • Some of the problems I encounter are listed below and are illustrated in following viewgraphs

  5. Don’t want multiple sources • Geomagnetic Array/Chain/Station(In Alphabetical Order by Network/Station Name) • AGO -Antarctica/Automatic Geophysical Observatories  (Fluxgate Magnetometer PI L. J. Lanzerotti) • Canadian Magnetic Observatories -Canadian National Geomagnetism Program Operated by the Geological Survey of Canada (GSC) • CANOPUS/MARIA - Canadian Auroral Network for the Open Program Unified Study/Magnetometer and Riometer Array (PI: G. Rostoker) • Status of CANOPUS by G. Rostoker with a  postscript mapMap and DataCANOPUS Real Time Data • CPMN -The Circum-pan Pacific Magnetometer Network (1996 - present, PI: K. Yumoto) • GIMA -Alaska/The Geophysical Institute Magnetometer Array (PI: J. Olson) • Greenland Magnetometer Array - Ground-based Observations at the Danish Meteorological Institute (DMI) (PI: Jurgen Watermann) • IGPP/LANL -IGPP/LANL Ground-Based Magnetometer Array (V. Angelopoulos/G.Reeves/G. Le) • IMAGE - International Monitor for Auroral Geomagnetic Effects • INTERMAGNET - International Real-time Magnetic observatory Network • Kakioka Magnetic Observatory - Managed by Japan Meteorological Agency • Kiruna Magnetogram -Station Managed by Swedish Institute of Space Physics (Ingemar Häggström) • MACCS - Magnetometer Array for Cusp and Cleft Studies (M. Engebretson/W. J. Hughes) • MAGIC -Magnetometer Array on the Greenland Ice Cap (PI: C. R. Clauer) • The List of Magnetometer Arrays Continues!!!

  6. Don’t Want Multiple Interfaces

  7. Don’t Want Multiple Forms 1 Download one day of datafrom ACE magnetometer 2 3 4 5 6 7

  8. Don’t Want Complex Formats(The WDC Exchange format) • This is not a flat file – minutes are horizontal, hours are vertical • The hour average is last column of each row • There is no “white space” to distinguish fields in header • The file contains a month of Asym D records followed by three other sets

  9. Don’t Want Do-it-yourself translation • This is the CDF V3.1 online documentation directory. • This directory contains the following files... • cdf31ug.pdf CDF User's Guide (in an uncompressed Portable Document Format (PDF) file) • cdf31crm.pdf CDF C Reference Manual (in an uncompressed Portable Document Format file) • cdf31frm.pdf CDF Fortran Reference Manual (in an uncompressed Portable Document Format file) • cdf31ifd.pdf CDF Internal Format Description (in an uncompressed Portable Document Format file) • cdf31jrm.pdf Java-CDF APIs Reference Manual (in an uncompressed Portable Document Format file) User’s Guide319 pages!!!

  10. Don’t Want Restricted Access

  11. Many systems restrict the amount of data transferred per transaction Don’t Want Restricted Transfers

  12. Don’t Want Single File Transfers • Data are written in binary flat files with one day per file • One file is transferred at a time by right-clicking and responding • One station for five years requires 3650 file transfers!!!

  13. Don’t Want Missing Records • Some datasets discard records with bad data creating time sequences, not time series. The location of a given time in the file is not predictable. • Some datasets have no file when all data are missing. This makes the file structure unpredictable.

  14. Don’t Want Changing File Structure • Most formats accept a variable number of columns (fields) • Some datasets will change the number of columns at different points in the file collection • This requires a more sophisticated program to read.

  15. Simple plots are important in data selection These plots should adapt to the data to reveal features present Don’t Want Fixed Plot Scales

  16. Don’t Want Unrealistic Assumptions • The user knows what he wants better than any database developer • Data systems should be build in response to requirements provided by users • Don’t design the system to support only “event studies” • Some users require multi-year datasets to do statistical studies

  17. What We Want • Quick, easy access to data • One simple, direct interface to all data • Data in the format we need • Unrestricted use of data • Unlimited quantities • Ability to locate and give proper acknowledgements

  18. How to accomplish this • Virtual Observatories • Start with discipline specific virtual observatories (VxO) • Later bring all VxO under one umbrella: the Great Observatory • If the VxOs adopt common technologies the Great Observatory will grow organically • This will take careful planning, coordination and participation from the entire community …and most important: funding

  19. Standards for metadata description Outreach to potential providers Logically organized data repository Web access to data Ability to update inventory Format conversion utilities Selection of variables Selection of arbitrary time intervals Gap filling with flags File concatenation Interchange format (ascii?) Data use contracts created in real time Extensive log of queries and inventory of data delivered Frequent reports to data providers Use of logos and banners to credit data provider Provision of links to providers system when desired Assistance for small providers in form of software and support for generating metadata VXO Requirements

  20. Why Participate in a VxO? • The provider must have incentives to participate! Carrots • VxO will present a new data requestor with a contract that must be signed with the provider if he demands it • The VxO presents banners and logos identifying the data provider • The VxO delivers a dataset citation with each filled request • The VxO provide links to provider so users may go there if they choose • The VxO tracks the monthly utilization of provider’s data • The VxO provides tools and services not implemented by provider • The pursuit of science: Open access to data will enhance science • Potential for data analysis programs involving the data (funding) • More focus on science rather than information systems Sticks • Pressure from funding authority for a provider to participate • Included in new contracts • Require data to be public in order to use in DAP • Funding from the VMO to make the data available

  21. Establishing the VxO • A common data model (SPASE) • Establish the infrastructure (protocols, services and core functions) • At first, link existing repository and data sources • Later, establish a true archive which contains reviewed and vetted data. (one reliable source)

  22. An Open Data Environment is a Paradigm Shift • Public funds, public data • Will other agencies or nations accept this? • However, we must recognize value added. • Raw to calibrated • Refinement • Discovery • Level the playing field so individual users can get to data as easily as larger organizations • Better alignment with current funding that leads to smaller programs • Consolidate information system investment to expand science investment.

  23. Conclusions • The existing system of data sharing is so complex that it leads to inefficient scientific activity • A new system of Vxos would have simple interfaces that give the user what he needs in a simple transaction • Such a system requires cooperation of data providers • The system also requires community development of standards and protocols facilitating VxO interactions • The current development of Virtual Observatories is a step in the right direction provided they do not attempt to do too much • The “Great Observatory” can only be built after Virtual Observatories demonstrate success

  24. The End! Thank you all!

  25. What needs to be done? • A central theme of this symposium is that these problems can be solved with a “Great Observatory” • Resident archives: • One or more datasets maintained by an individual or organization made available to the public • Virtual Observatories: • An interface to simplify access to multiple archives • Great Observatory for Heliophysics: • The collection of all systems available to support research in this subject area

  26. Key elements (1) services grid [service oriented architecture (SOA), including Web Services] (2) knowledge grid [ontology inference layer, unified schema, data mining] (3) computation grid (4) sensor grid [e.g., multi-node, dynamically adaptive, observing systems or "sensor webs"] SM24A-03: Eastman and Borne, Key Architecture Elements of a Great Observatory for Space Physics The data environment of the Great Observatory requires many multi-disciplinary data processing services to be fully functional SM24A-06: Narock, Szabo, Rash Connecting Virtual Observatories with Grid Enabled Services Visions of the Great Observatory

  27. Participation of Independent Archives • The VXO is an intermediary dependent on the data provider and the good opinion of users • The provider must have incentives to participate • Resources to make the necessary changes • Credit for producing the data • Links between the VXO and the provider so that users are aware of the data source and the requirements for its use • A monthly report from VXO on the utilization of provider’s data • Pressure from funding authority for a provider to participate

  28. VITMO system is … a set of services: centralized browse and query/retrieval of distributed resources, access to data reader software and other tools, integration of current data with data from previous missions and long-term data sets. The VITMO will also organize tools: Plotting Subsetting Analysis tools SM23A-06: Morrison, A Virtual Observatory for the Ionosphere-Mesosphere-Thermosphere Community Our vision is to take One Step at a Time Build functional Virtual Observatories Make existing data archives secure Add smaller archives Add simple services More Visions

  29. What Does the “Great Observatory for Space Physics”Need? • The Great Observatory (GO) would facilitate the study of the Sun, solar wind, and interactions with planetary bodies • It would do this by: • Identifying data providers • Listing the inventories of available data • Providing a common user interface • Creating output files of uniform format • Packaging data in structures independent of the original • Delivering required metadata • Maintaining statistics on data use and facilitating dataset citations • Developing essential services to add value to basic datasets • Achievement of these goals requires agreements to open datasets, standards for metadata description, means to populate the metadata, portal for data access, and means for assignment of credit

  30. Advantages of Participation? • I don’t see what they are! • If you have the resources to distribute your data via web why should you participate? • You can use the government mandates of public access to big data sets without PI participation, but demand that your supporting data guarantee co-authorship • Help me on this one!!!

  31. Key architecture elements of a Great Observatory for space physics can then be framed within the broad categories of: • (1) services grid [service oriented architecture (SOA), including Web Services] • (2) knowledge grid [ontology inference layer, unified schema, data mining] • (3) computation grid • (4) sensor grid [e.g., multi-node, dynamically adaptive, observing systems or "sensor webs"] • Complementing this with tools for end-to-end web-enabled publishing of and access to data, metadata, software and science results then brings the Great Observatory to every scientist's desktop SM24A-03: Key Architecture Elements of a Great Observatory for Space Physics T E Eastman, K D Borne

  32. The data environment of the Great Observatory requires many multi-disciplinary data processing services to be fully functional • SM24A-06 Connecting Virtual Observatories with Grid Enabled Services*T Narock, A Szabo, K Rash

  33. The core VITMO system is based upon a set of services: centralized browse and query/retrieval of distributed resources, access to data reader software and other tools, and integration of current data with data from previous missions and long-term data sets. The VITMO will allow vastly improved complex data search and location capabilities allowing multidisciplinary and multisatellite studies to be performed. The VITMO approach is easily extensible to future data sets and will be able to tie into Virtual Observatories in other domains as either a peer node or a service. The VITMO will also organize tools, whether plotting, subsetting, or analysis tools by the type of data they are to be applied to as well as the types of operations that are to be performed. Relevant tools and models will be presented to the user through a tabbed browser interface. This interface is generated dynamically based on the metadata in the VITMO catalog that describes the data, tools, and models available through it. • SM23A-06TI: A Virtual Observatory for the Ionosphere-Mesosphere-Thermosphere CommunityAU: * Morrison, D

  34. HR: 16:30hAN: SM24A-02TI: Data Set Creation Using the Open Source Software Development ModelAU: * Weigel, R SEM: rweigel@gmu.eduAF: George Mason University, 4400 University Drive, Fairfax, VA 22030 United States • AB: The cycle of data use by a researcher generally includes exploration, database construction, data reduction, algorithm and analysis code development, documentation, informal exchange, and archiving. While some of these tasks are best facilitated and managed by large data centers, other aspects may be simplified by the use of the informal practices and tools that are an integral part of open source software development. We present our experience in the development and use of a space science time series data set that employs some of these practices. The data set has over 1000 time series from a diverse set of sources that include CISM, DMI, FMI, ISGI, NSSDC, NGDC, and SEC, along with data sets from individual scientists. Based on this experience, we propose a system that would enable the open source development model to gain wide use by individual researchers.

  35. Example of Changes • C:\My Documents\Meetings\AGU Meetings\AGU2006SPR\G8_K0_MAG_16368.txtG8_K0_MAG_16368.txt • E:\My Work\Datafile\GOES\GOES08\2000\G08200001.txtG08200001.txt • Eliminate > 100 header linesdd-mm-yyyy hh:mm:ss.ms nT nT nT • Convert the time columns 01-01-2000 00:00:30.000 [2000 01 01 00 00 30] • 01-01-2000 10:10:30.000 -1.00000E+31 -1.00000E+31 -1.00000E+31 -1.00000E+31 NaN

  36. Google Can Find Data Providers

More Related