180 likes | 306 Vues
Bridging the Digital Divide: eGY and Virtual Observatories. Barbara J. Thompson Solar Physics Branch and IHY/eGY Team NASA Goddard Space Flight Center with
 
                
                E N D
Bridging the Digital Divide: eGY and Virtual Observatories Barbara J. Thompson Solar Physics Branch and IHY/eGY Team NASA Goddard Space Flight Center with Bob Bentley, Rick Bogart, Alisdair Davey, Craig DeForest, Joe Gurman, Neal Hurlburt, Vladimir Papitashvili, Aaron Roberts, Adam Szabo, C. Alex Young, and Dominic Zarro http://egy.org - VO working group and enabling activities http://lwsde.gsfc.nasa.gov - results from a broad community VO discussion & workshop
Why don’t people use all existing data? • Don’t know it exists • Don’t know how to obtain the data • Data permission issues • Don’t know enough about it to be able to analyze it • Don’t have access to or knowledge of the software • Takes too much effort to analyze it all • Data format or software incompatibility All of these are clearly related to the objectives of the eGY. Virtual observatories will play a major role in helping us cope with most of these issues.
What is a VO? More importantly, what is the difference between a VO and a data system? • A data system’s primary concern is storage and service of data, and the accessibility / user interface can vary greatly between systems. • A virtual observatory does not necessarily store data, but provides a single access point to available data & models, using a standard interface.
What is a VO? Data System I Data System II Search/query & result User Data Products, Model Services User query is translated to distributed query/search Streamlined query New products Data Systems, Products, Model Services Users VO Catalogable results Analysis tools Good ideas Data System Approach: VO Approach:
How a VO works How a VO works and the services a VO provides is largely determined by the community it serves. Still, many VO’s contain the following features: • An interface & tools that make it easy to locate and retrieve data from catalogs, archives, and databases • Interoperability: data services that can be used regardless of the client’s computing platform, operating system, and software capabilities • Tools or access to tools for data analysis, modeling, simulation, and visualisation  Tools to compare observations with results obtained from models, simulations, and theory • Access to data in near real-time when necessary, as well as archived and historical data. Different modes of implementation, such as socket-based programming and the “pull vs. push” concept, depend on user needs & preferences, global access issues, & the complexity of the data/models involved.
Key Features of a (Successful) VO • Universal accessibility: web browser interface or a standard, easily implemented platform that can quickly be integrated with multiple user analysis environments • Easy to join • Provides not only data & models, but is able to provide information & access to tools, products & services enabling the science • Doesn’t reinvent any wheels - uses what’s available and adds features and versatility • Grass-rootsy, ground-up approach: few features have to be hard-wired or there from the beginning. New features and services can be added in an organic, community-based way • Adaptive - most VO’s are continuously modified and improved • Reverse-compatible: because most VO’s are constantly being updated, improved, modified, upgraded… • Less emphasis on formats, more emphasis on catalogability, accessibility & retrievability • Community-supported: allows ideas and advances from the entire scientific community to be rapidly ingested for global use • Focus remains on the fundamentals: enabling the user to locate, query & access data, models, tools & services
VO Data Provider’s Responsibilities and Requirements (there aren’t many!): • Data must be accessible and retrievable in some agreed-upon standard way • The data’s search interface must be compatible with the VO, or the data provider must provide metadata which can be queried • Ideally, the data provider will also make analysis software available that is compatible and takes advantage of the VO interface • Data provider must still take responsibility to ensure intelligent analysis
What are the advantages of VO’s? Ease of use / Accessibility: • Enables greater access to data, including researchers in developing nations. This is good for scientists around the globe, and it’s more and more project reviews are taking into consideration the breadth of the “user base.” • VO’s (should) talk to other VO’s. Compatibility with other data environments isn’t as much of an issue once you’ve joined a VO. • Enables the use of multiple types of data – data format issues become more transparent Cost and Time: • Cheap for the data producers – to serve data, you needn’t set up a big data system. Just join a VO. • Saves data retrieval time for the user of the data • Saves analysis time – most VO’s are also able to provide information about and access to higher-level data products and results produced by other users
What are the advantages of VO’s? Enables Science: • You don’t look for the data, itfinds you. Scientists will use more sources of data in analysis. • Reproducability/verification of results • Data products and higher levels of processing can be served as well, regardless of the source • Forms a foundation and interface for virtual analysis activities – VO’s are only the beginning! • A VO also can provide a versatile interface to electronic analysis activities, such as virtual “workflows,” open-source software environments, and Virtual Analysis Environments (VAE)
What a VO Won’t Do • VO’s will never be able to remove the need for an active human role in data analysis. However, it will allow humans to do it with much greater efficiency. • Data mining • Intelligent agents (AI/neural nets) • Data provider must still take responsibility to ensure intelligent analysis
Why do we need VO’s? • They save time & money • Broker / matchmaker between data providers and scientists - “you don’t have to find the data, it finds you.” • Enables global e-Science: A VO can provide a versatile interface to electronic analysis activities, such as virtual “workflows,” open-source software environments, and Virtual Analysis Environments (VAE) • Can play a major role in enabling science in developing nations http://egy.org - VO working group and enabling activities http://lwsde.gsfc.nasa.gov - results from a broad community VO discussion & workshop
The Solar Physics Division of the AAS andthe electronic Geophysical Year invite you to aShowcase of Virtual Observatories Room 237, Morial Convention Center May 24, 2005 2:00 - 4:00 PM during the eGY poster session Tuesday afternoon All virtual observatory initiatives are encouraged to participate. Room 237 will also serve as the “SPD Tutorial Facility” throughout this meeting. Please stop by to view the schedule of events.
We’re not downhearted (yet) • The capacity and capacity per unit price of disk storage has steadily increased; doubling time is ~ 7 months • Network-attached RAID (NAS) servers are becoming simpler and cheaper • Simply storing the data is not a problem in the forseeable future Data source: Rev. C of Seagate SCSI disk drive product manuals (i.e., first OEM-quantity release)
But how do we find and useall those data? • Solar data searches tend to be for multiple wavelength/entendu data sources for the same time period (i.e. not RA and DEC or other position or object) • Current archives are available on the Web at many sites, with heterogeneous search capabilities • Most but not all data of current interest are in FITS format • SolarSoft tree (in IDL™) offers ground-based, multiple space-based observatory support, as well lots of “generic” functionality (“the wheel” that you don’t want to have to reinvent)
Biggering and biggering • Solar data set sizes are growing at an impressive rate • Data sets that are “only” several Tbyte in size will be dwarfed in 5 - 6 years Data point sizes represent the data rate; the ordinate represents the total data volume from the source.
Toward a Virtual Solar Observatory • Three parts: • distributed archives • metadata “broker” facility • Web-based front end • Can have different implementations • XML • Gnutella • &c. • Will be low-cost • No more than $1.2M over next four years • We must be really smart: • Same model adopted by NVO, PDS; EGSO examining • …. or maybe there’s exactly one obvious way to do this
Volodya: I like your VO definition a lot, and thanks for the comments. They'll be a great help! I completely understand your comments about the CDAWs. The CDAWs had a great deal of difficulty generalizing their activities for individual users, while VO's start from that approach. We might have put the cart before the horse - it appears that the virtual analysis environments will be spawned by the VO's, and not the other way around. Can you think of any examples (outside of COSEC, my best example so far) of analysis environments extending from VO's? Your "pull" data concept enables you to store data for future use, and it can also store analyzed data and products as they are produced. I want to do a bit of prognosticating at the end of the talk, because I think the VO's will develop far beyond a streamlined data access system as they enable online analysis and joint analysis projects. Perhaps it's not too late for the CDAWs.