1 / 23

A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES

A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES. Ilya Zaslavsky, Reza Wahadj, David Valentine, Blair Jennings (San Diego Supercomputer Center, UCSD) David Maidment (CRWR, UT-Austin) and other HIS development partners

tolla
Télécharger la présentation

A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky, Reza Wahadj, David Valentine, Blair Jennings(San Diego Supercomputer Center, UCSD) David Maidment (CRWR, UT-Austin) and other HIS development partners from UT-Austin, Utah State U, Drexel U, Duke U

  2. The Grid is becoming the backbone for collaborative science and data sharing CI is about RE-USING data and research resources !!

  3. CI Vision for Hydrologic Science • Leverage ongoing cyberinfrastructure projects: • Geosciences Network (GEON) • Share data between Earth Disciplines • Secure access to Grid resources, single sign-on authentication/ authorization, distributed data management, data publication, search, information integration, knowledge management, scientific workflows, archiving • Integrate with common COTS (commercial off-the shelf) software: • Excel, ArcGIS, Matlab… • and Fortran … mostly on Windows… • Interesting survey of CUAHSI partners by David Tarboton!

  4. HIS User Assessment (Chapter 4 in Status Report) Which of the four HIS goals is most important to you? Data Access Observatory support Science Education

  5. Tuning to unique features of hydrology • Hydrologic observations: • Reliance on federally-organized data collection (NWIS, STORET, Ameriflux, etc.) with huge and complex nomenclatures •  simplifying access to federal repositories •  relatively lower emphasis on data ownership • Handling time in both UTC and local • Various spatial offsets • Multiple data types: time series, fields, spatial data • Integrative discipline: • Interoperation with atmospheric, ocean, soils, geomorphology, social datasets and services… • Community: • Organized by “natural boundaries” •  natural object hierarchy •  networks of relatively autonomous self-managed data nodes • Partnership with public sector water management ontologies

  6. Problems • Microsoft and .NET vs Linux and J2EE • Open source vs proprietary • Free vs not free Open architecture, web services, well-defined interfaces

  7. NASA Storet Ameriflux Unidata NCDC NCAR NWIS CUAHSI Web Services Excel Visual Basic ArcGIS C/C++ Matlab Fortran Access SAS Main Components • Web services for accessing hydrologic repositories • Hydrologic Observations Data Model • Hydrologic Data Access System + Time SeriesViewer • Collection of CUAHSI nodes

  8. 2 Matlab ArcGIS 3 ) 0 1 ( s e c i v r e s d e t a l e r d n a y r t s i g e r s e c i v 1 r e s b e W NWIS Sensor data filtering (4) Sensor management services (3) Service consumers Fortran/C/VB/Java codes Excel Web browser portal User registration/authentication/authorization (9) Application services: analysis, mapping, charting, models, workflow, integration (8) ArcGIS R Conversion Certificate Server Server engine authority Data registration/Search/ Query rewriting & orchestration(6) Ontology source and services (7) Hosted dataservices (5) External data resources registry metadata Resource drivers (2) Data Nodes STORET . . . NAWQA Core grid services: monitoring nodes, scheduling, data transfer, replication, collection management,…(1) Sensors Sensors Sensors Sensors Data Node Data Node Data Node Data Node

  9. Data Sources NASA Storet Ameriflux Unidata NCDC Extract NCAR NWIS Transform CUAHSI Web Services Excel Visual Basic ArcGIS C/C++ Load Matlab Fortran Access SAS Applications http://www.cuahsi.org/his/ Some operational services

  10. Database Sizes Records Stations Time range USGS 250 million 1.5 million 100 years EPA 200 million 800,000 100 years NWS 100 years ? 19,000 (From Jon Goodall, Duke U.)

  11. Language for Data Representation Time of Measurement Unique Identifier for a Observation Station Latitude, Longitude USGS dec_lat_va, dec_long_va dv_dt site_no EPA Station Latitude, Station Longitude Activity Start Station ID NWS LATITUDE, LONGITUDE YEAR,MO,DA,TIME COOPID Lots of semantic differences in parameter names, methods, etc.

  12. Typical Example of Data Retrieval National Water Information System (NWIS)

  13. Core Web Services

  14. CUAHSI Web Services http://www.cuahsi.org/his/webservices.html NCEP North American Forecast Model 12 Km grid for continental US

  15. A relational database stored in Access, PostgreSQL, MS SQL Server, …. Stores observation data made at points Consistent format for storage of observations from many different sources and of many different types. CUAHSI Point HydrologicObservations Data Model (D. Tarboton, USU) Streamflow Groundwater levels Precipitation & Climate Soil moisture data Water Quality Flux tower data Community design requirements (22 reviewers)

  16. Schema

  17. Feature Hydrologic Observations Data Model MonitoringPoint Waterbody Watershed HydroPoint WaterID HydroCode HydroID HydroID HydroID Name HydroCode HydroCode HydroCode Latitude * FType DrainID FType Longitude Name AreaSqKm … Name AreaSqKm JunctionID JunctionID JunctionID NextDownID * * ComplexEdgeFeature SimpleJunctionFeature CouplingTable WaterID (GUID) HydroID (Integer) HydroEdge HydroJunction HydroJunction HydroJunction 1 HydroID 1 HydroID HydroID HydroID HydroCode HydroCode HydroCode HydroCode ReachCode NextDownID NextDownID NextDownID Name LengthDown LengthDown LengthDown LengthKm HydroNetwork DrainArea DrainArea DrainArea LengthDown FType FType FType FlowDir Enabled Enabled Enabled FType AncillaryRole AncillaryRole AncillaryRole EdgeType Enabled EdgeType Flowline Shoreline Independent of, but coupled to Geographic Representation Arc Hydro HODM 1 1 OR 1 1

  18. Uses and tools for HODM • HODM is central to HIS infrastructure, but lacks tools • Testing HODM with two types of data: federal repositories, and external databases (Panola). Personal and enterprise versions. • Mapping wizard: loading Excel observation data to HODM database: • Can save mapping files for subsequent runs of similarly formatted spreadsheets • Local data analysis can be done: charts and stats • HDAS as an interface to HODM datasets - but shall not be the only one - so exposing HODM as Web services

  19. Hydrologic Data Access System http://river.sdsc.edu/hdas/

  20. Hydrologic Data Access System

  21. Remote CUAHSI HIS Node (Windows) Remote CUAHSI HIS Node (Windows) Remote CUAHSI HIS Node (Windows) Remote CUAHSI HIS Node (Windows) HODM HODM HODM HODM Web Web Web Web HDAS HDAS HDAS HDAS Web Web Web Web Services Services Services Services Service Service Service Service IIS Web Server IIS Web Server IIS Web Server IIS Web Server ASP ASP ASP ASP . . . . Net Net Net Net Web Web Web Web ArcGIS ArcGIS ArcGIS ArcGIS SQL Server SQL Server SQL Server SQL Server Service Service Service Service Technologies Technologies Technologies Technologies proxies proxies proxies proxies Data Data Data Data Cross-platform design GEON Data Node (Linux) Central CUAHSI HIS Node (Windows) HODM Web HDAS Web Geon Software Stack Services Service Proxy IIS Web Server ASP . Net Apache Tomcat Web ArcGIS SQL Server Service Technologies proxies Data Data Remote CUAHSI HIS Nodes (Windows)

  22. HIS Scalability • Adding… …data types and datasets; processing models and services; servers; users and roles – - shall not create unmanageable bottlenecks that require system re-engineering • Designing for scalability: • Distilling a generic set of web service signatures; resolving semantic and structural heterogenities • Using HODM as a common generic format for time series data, for ease of coding and uniform search interfaces • HDAS GUI design to abstract specifics of disparate repositories • Leveraging common CI components developed in GEON Need to work with agencies to remove web services bottleneck

  23. Future Work • Updating and standardizing web services; services against additional repositories • Adopting HODM for storing time series observations, and developing tools for loading data, querying, analyzing and visualizing data in HODM • Finalizing the Windows-based CUAHSI Node, and preparing it for distribution, along with documentation • “Digital Watershed” conceptualization

More Related