320 likes | 459 Vues
HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture. M. Piasecki November, 2007. Lecture. Demo of HydroSeek What are the search criteria? Functionality of the Engine Interface Data Sources Common Sources
E N D
HYDROSEEK and HYDROTAGGERA Search Engine for HydrologistsGIS in Water Resources Lecture M. Piasecki November, 2007 Department of Civil, Architectural & Environmental Engineering
Lecture • Demo of HydroSeek • What are the search criteria? • Functionality of the Engine Interface • Data Sources • Common Sources • Common Problems (Completeness, Syntax, Semantics) • Ontologies • Ontology details • Concept-to-data variable tagging • Architecture • Flow Chart • Technologies used • Demo of HydroTagger • Why the Tagging? • Technologies Department of Civil, Architectural & Environmental Engineering
www.HydroSeek.org Department of Civil, Architectural & Environmental Engineering
HIS Goals • Hydrologic Data Access System – better access to a large volume of high quality hydrologic data • Support for Observatories – synthesizing hydrologic data for a region • Advancement of Hydrologic Science – data modeling and advanced analysis • Hydrologic Education – better data in the classroom, basin-focused teaching Department of Civil, Architectural & Environmental Engineering
request return request return return request NAWQA return request NAM-12 request return NWIS return request return request return request NARR Objective What we are doing now ….. • Search multiple heterogeneous data sources simultaneously regardless of semantic or structural differences between them Department of Civil, Architectural & Environmental Engineering
NAWQA NWIS NARR HODM What we would like to do ….. GetValues Semantic Mediator GetValues GetValues GetValues generic request GetValues GetValues GetValues GetValues Department of Civil, Architectural & Environmental Engineering
Data sources… USGS EPA CIMS TCEQ NADP Department of Civil, Architectural & Environmental Engineering
Spatial Coverage STORET has 758 sites in Texas, TCEQ has 8407. STORET has 47,602 sites in Florida, NWIS has 27,906. NWIS has 121,545 in Minnesota, STORET has 22,260. Department of Civil, Architectural & Environmental Engineering
Data Availability Department of Civil, Architectural & Environmental Engineering
Temporal Coverage 2003-2007 1977-2003 1957-1977 Nitrogen Department of Civil, Architectural & Environmental Engineering
Interface Problem NWIS ~175 form elements on a single page STORET + NWIS + TCEQ + CIMS = ???A drop down menu ∞ String search across parameter list? How about synonyms? ‘Elevation, water surface’ vs. ‘stage height’ Department of Civil, Architectural & Environmental Engineering
Completeness Problem: Metadata Catalog • Better query performance • Freedom • Fewer errors Availability of geographic identifiers for stations in EPA STORET Department of Civil, Architectural & Environmental Engineering
Heterogeneity Problem • Syntax E.g. date & time formats, Gregorian versus Julian • Data format/structure E.g. XML, HTML, tab/tilde/comma separated text, gunzipped tar balls… • Semanticsmore ….. Department of Civil, Architectural & Environmental Engineering
Issues with Semantics • Hyponymy Parameter “Groundwater level”, “Stream stage”, “Reservoir level” versus “Water level” • Pseudo hyponymy due to lack of metadata Parameter “Manganese, 6N hydrochloric acid extracted, recoverable, dry weight, milligrams per kilogram” versus “Manganese, milligrams per kilogram” • Synonymy ‘Total Kjeldahl Nitrogen’ vs. ‘Ammonia+Organic Nitrogen’ Department of Civil, Architectural & Environmental Engineering
Search Strategy Search Fine tune Retrieve rather than Search Retrieve avoid ‘high precision, low recall’ and ‘low precision, high recall’ problems. Department of Civil, Architectural & Environmental Engineering
Layered Ontology Model Department of Civil, Architectural & Environmental Engineering
Core Navigation Compound Department of Civil, Architectural & Environmental Engineering
Knowledge Base • Supports classification of search results • Entities in the ontology are associated with measured variables in a relational database • Helps solving semantic heterogeneity issues between data repositories ‘Escherichia coli’ = ‘E. coli’ ‘E. coli’ is-a ‘Indicator Organism’ ‘Copper’ is-a ‘Micronutrient’ ‘Copper’ isMeasuredIn ‘Medium’ ‘Medium’ = {Water, Soil…} ‘Micronutrient’ is-a ‘Nutrient’ OWL Ontologies Department of Civil, Architectural & Environmental Engineering
Department of Civil, Architectural & Environmental Engineering
http://www.cuahsi.org/his/webservices.html USGS Data Source Point Observations Information Model GetSites Streamflow gages Network GetSiteInfo Neuse River near Clayton, NC Sites GetVariables GetVariableInfo Discharge, stage (Daily or instantaneous) Variables GetValues Values 206 cfs, 13 August 2006 {Value, Time, Qualifier, Offset} • A data source operates an observation network • A network is a set of observation sites • A site is a point location where one or more variables are measured • A variable is a property describing the flow or quality of water • A value is an observation of a variable at a particular time • A qualifier is a symbol that provides additional information about the value • An offset allows specification of measurements at various depths in water Department of Civil, Architectural & Environmental Engineering
Hydroseek Webservices EPA STORET USGS Daily WaterOneFlow CIMS HydroSeek USGS Realtime WaterOneFlow TCEQ MicroSoft Server VirtualEarth Map San Diego Supercomputer Center Server Native Services WaterOneFlow WaterOneFlow Drexel Server WaterOneFlow Most Hydroseek functions are available as web services (SOAP) Support for queries using GlobalChangeMasterDirectory GCMD keywords Supports output in GeographyMarkupLanguage GML as well as WaterML Department of Civil, Architectural & Environmental Engineering
GetStations Request Response BoundingBox Department of Civil, Architectural & Environmental Engineering
GetStationsByHU Request Response HUC_Code Department of Civil, Architectural & Environmental Engineering
GetStationCatalogueFiltered Request Response Department of Civil, Architectural & Environmental Engineering
GetStationCatalogue Request Response Department of Civil, Architectural & Environmental Engineering
Allows searching multiple heterogeneous data sources simultaneously regardless of semantic or structural differences between them • Modular & extensible Architecture Outline Inside the CUAHSI HOD Module Department of Civil, Architectural & Environmental Engineering
The Database-Ontology Link www.HdyroTagger.org Department of Civil, Architectural & Environmental Engineering
1) MappingsApproved_Table 2) FrequentUpDates_Table HydroSeek ODM neededan upgrade, i.e. additionaltables. Department of Civil, Architectural & Environmental Engineering
How does the Tagging work? Step 1 Users need to register on the web-site first before they can use the HydroTagger. When registering select the testbed site you are affiliated with. Each testbed site needs ONE administrator who can then admit additional users for that specific testbed site. Please send an email to identify the designated tagger site administrator so we can promote that person to the role. Department of Civil, Architectural & Environmental Engineering
How does the Tagging work? Step 2 The “Sniffer” jumps into action and trawls through the testbed sites to find and identify new variablenames (once a week, currently every Sunday night) It does so by using the regular web-services published through the WSDL (no “hacking”!!!) It returns i) data updating information and ii) variablenames used and compares these to those used by HydroSeek. WATERS Network Information System Department of Civil, Architectural & Environmental Engineering
How does the Tagging work? Step 3 The Tagger now updates the HydroSeek catalogue (an amalgamation of all 10 testbed catalogues) with the newly found data entries. If it finds a new variablename (introduced during the dataloading process using the Data-Loader), it puts it into a table and offers it up to he HydroTagger GUI for semantic Tagging. Department of Civil, Architectural & Environmental Engineering
Thank you…Questions? Department of Civil, Architectural & Environmental Engineering