Integr ácia a spracovanie údajov o životnom prostredí Technol ógia ADMIRE

Integrácia a spracovanie údajov o životnom prostredíTechnológia ADMIRE Ondrej Habala Seminár CRISIS, 18.10.2011 ITMS 26240220060

Goals Accelerate access to and increase the benefits from data exploitation; Deliver consistent and easy to use technology for extracting information and knowledge; Cope with complexity, distribution, change and heterogeneity of services, data, and processes, through abstract view of data mining and integration; and Provide power to users and developers of data mining and integration processes.

ADMIRE Architecture: Separation of Concerns

ADMIRE Architecture

ADMIRE’s High-Level Architecture

ADMIRE Gateways USMT

DISPEL – Data Intensive Systems Process-Engineering Language • Data-intensive distributed systems • Connection point of complex application requests and complex enactment systems • Benefit: method development, engineering and evolution of supported practices can take place independently in each world • Describes enactment requests for streaming-data workflows processes • “Process-engineering time” – transform and optimize process in preparation for enactment period

DISPEL: Simple Example Creating streams of literals String sql1 = "SELECT * FROM some_table"; String sql2 = “SELECT * FROM table2”; String resource = "128.18.128.255"; SQLQuery query = new SQLQuery; |- sql1, sql2 -| => query.expression; |- resource -| => query.resource; Tee tee = new Tee; query.result => tee.connectInput; Creating connections

DISPEL – real use

Aplikačné štúrieNasadenie technológie admire v životnom prostredí

Flood ApplicationData sets used in hydrological scenarios FSKD 2010 Yantai, China, August 10-12 11

Orava scenario • Legend • Green area – Orava (part of north Slovakia) • Blue – Orava reservoir and local rivers • Red dots– hydrological measurement stations • Notes • We are interested only on hydrological stations below the Orava reservoir • In our tests we will use the hydrological station 5830 (Tvrdosin)

ORAVA – data mining concept • Targets – water level and temperature at a station below the reservoir Targets of data mining Given in a schedule Predicted by a meteo model Predictors – rainfall amount (reservoir and station), air temperature (reservoir and station), reservoir discharge, reservoir temperature

ORAVA – data integration • Integration of data from • GRIB files • Reservoirs • Inputs • Time period of experiment • Reservoir ID • List of hydro stations • Geo coordinates

ORAVA – data sets

ORAVA ScenarioIntegrated and preprocessed data Integrated raw data Time [hours] Integrated preprocessed data Time [hours]

Orava ScenarioWater temperature prediction

Orava ScenarioWaterlevelprediction

Orava ScenarioData integration workflow

Orava ScenarioTraining workflow

Orava ScenarioPrediction workflow

Implementation Notes • Needed to write custom activities for certain data extraction tasks • Data integration was the most complex part of the scenario in terms of workflow design • Data integration was quite easy to write and modify in DISPEL once we had all the PEs in place • Used composite PE to extract different types of quantities from meteorological GRIB files

ADMIRE Architecture: Separation of Concerns

Orava Scenario Portal

Radar Scenario Very short-term rainfall prediction from weather radar data

Radar ScenarioDescription • Very short-term rainfall prediction from weather radar data • Movement of areas with higher air moisture content, and thus also higher precipitation potential • Networkofsynopticstations in Slovakia • 27 stations in Slovakia • Useddatafromyears 2007 and 2008 • Available variables: rainfall, humidity, Radar reflexivity, atmosphericpressure and temperaturevaluesforeachhour

Overview of the main predictors and target variables in the Radar scenario. The green cells are predicted from meteo-model. Blue cells are from model, based on motions vectors. Yellow cells are final target of data mining. Radar ScenarioMain predictors and target variables

Radar ScenarioAtributes of model • Isotonic regression model • 10-fold Cross Validation • Hydro-meteorological performance

RADAR model • Other tested models • Neural networks, SMOreg, linear regression, ... • Reached correlation coeficient between 0,35 and 0,42 • Validation - 10 Cross Fold • Problems in model creation : • process is significantly stochastic • Some input variables/parameters (humidity) are backwards dependent on output – rainfall. • Meteorological process is very sensitive • Reflection matrix represents quantity of water in atmosphere, not exact rainfall rate in specified area, as opposed to data from synoptic stations

Radar Scenario Training Forecast

Radar ScenarioMotionvectorcomputation

SVP Scenario Forecast of reservoir inflow based on temperature, precipitation and snow cover

SVP ScenarioStructureofdata • Two steps of prediction : • Copy previous values of snow quantity and inflow volume. • Apply trained models (snow model at first, and then inflow model). P(t) = S(t-1) I(t) = F(t-1) S(t) = f(P(t), R(t), E(t)) F(t) = h(I(t), S(t), E(t), R(t))

SVP ScenarioModels & Attributes • 10-Fold Cross Validation, 8760 records; models for inflow prediction • N-Fold Cross Validation, 8760 records; Decision Tree Model M5P

SVP ScenarioData Integration workflow

SVP ScenarioModel training workflow

SVP ScenarioForecast workflow

ADMIRE Tools Registry client GUI Process designer SKSA Gateway Process Manager DMI Model Visualizer

Registry client GUI Read-only access to ADMIRE Registry list PEs and view their properties search, sort PEs Write access to Registry is done via DISPEL documents

Process Designer Manage your DMI project (files, directories – project structure) Select elements from the Registry View the canonical (DISPEL) representation of your DMI process in real time View the properties of your chosen elements Edit your DMI process graphically

Semantic Knowledge Sharing Assistant Context the user works in Several reservoirs, one settlement Knowledge that may be useful in this context previously entered by other users Provides access to existing user’s knowledge, sorting and selecting it automatically according to the user’s current working context

Gateway Process Manager Keep track of running processes stop/pause/cancel the process view the process’ source DISPEL access process’ results (if available) in several ways – raw or visualized

DMI Model VisualizerFor data mining experts Visualization of data mining models Read Weka classifier object produce PMML description of the model Show the PMML as a graphical tree

Custom Application Portalfor end-users (domain experts)

Vďaka za pozornosť

Integr ácia a spracovanie údajov o životnom prostredí Technol ógia ADMIRE

Integr ácia a spracovanie údajov o životnom prostredí Technol ógia ADMIRE

Presentation Transcript

Integr cia nov legislat va

admire

Admire Journal

Pictures to Admire