1 / 44

Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture

Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture. Peter Breza ny University of Vienna. Collecting Data. Satellites. Laboratories (microscopes, MRI/CT scanners, ...). Data Re- positories. Business. Analysis. Experiments (high energy physics,...).

aldis
Télécharger la présentation

Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna

  2. Collecting Data Satellites Laboratories (microscopes, MRI/CT scanners, ...) Data Re- positories Business Analysis Experiments (high energy physics,...) Computer simulations

  3. Computational Grid – a new-generation infrastructure Challenge: Advanced analysis of data managed by Grid Typical data in modern Grid applications: files, file collections, relational and XML DBs, virtual data, data objects The data is often is large, geographically distributed and its complexity is increasing; some applications require special security precautions. Our research aims: Phase 1 : Knowledge discovery Grid system (GridMiner) Phase 2 : Intelligent Grid system (WisdomGrid) Motivation

  4. Motivation Background and Related Work Basic Concepts and GridMiner Architecture Grid Data Integration System Data Mining Layer Implementation Issues and Experiments Future Research Conclusions Outline

  5. Basic Grid development (Globus 1) – metacomputing Data Grid (Globus 2, DataGrid of CERN, etc.) Semantic Grid (myGrid) Open Grid Service Architecture (Globus 3, OGSA-DAIS) Parallel and Distributed Data Mining and Data Warehousing Knowledge Grid (GridMiner and work of others) Web Intelligence Background and Related Work

  6. Open architecture Data distribution, complexity, heterogeneity, and large data size Applying different kinds of analysis strategies Compatibility with existing Grid infrastructure Openness to tools and algorithms Scalability Grid, network, and location transparency Security and data privacy OLAP support GridMiner Requirements

  7. GridMiner (Layered) Abstract Architecture User Interface Knowledge Grid Data to Knowledge Control Information Grid Computational & Data Grid Built on the K.G. Jeffery‘s proposal

  8. GridMiner Conceptual Architecture J o b C o n t r o l

  9. Service Architecture Based on OGSA-DAIS

  10. Data Distribution Scenarios • Single data source • Federated data sources with different types of partitioning

  11. Example Vertical and horizontal distribution of the virtual data source

  12. Mapping Schema

  13. Grid Data Mediation Services

  14. Architecture of a Data Mining System

  15. GridMiner Service Factory GridMiner Service Registry GridMiner Data Mining Service GridMiner Preprocessing Service GridMiner Presentation Service GridMiner Orchestration Service Components of the Data Mining Layer

  16. Centralized Data Mining

  17. Parallel and Distributed Data Mining

  18. GridMiner Orchestration Service

  19. GridMiner Job Specification Language

  20. Implementation of the Mediation Service for horizontal data partitioning Implementation of Data Mining Services for decision tree construction as OGSA conformous Grid service, based on the Globus Toolkit 3 Release We use a freely available Java-based data mining system Weka (data preprocessing and data mining tasks) – (main memory oriented) a home-grown Java implementation of the algorithm SPRINT (disk-oriented) Implementation Prototype

  21. Test data suites synthetical data (generated by an extended version of the IBM Quest Synthetic Data Generation Code) TBI (Traumatic Brain Injury) databases Grid testbed Vienna CERN Dublin Zagreb Cracow Goals in the first phases Verifying model accuracy Overhead of the service layers Experimental Environment

  22. Extending theFunctionality

  23. OLAM

  24. Example: Mining Patterns for Data Classification and Associations use databasedat1, dat2 mine classifications analyze patient_outcome usingg_parsimony display astree use databaseDBs attributes mine associations usingmethod_attributes display asrules

  25. Workflow 1: Interactive Mode

  26. Workflow 2: Batch Mode

  27. Workflow 3: Hybrid Mode

  28. Execution Model Based on Static Workflow

  29. Execution Model Based on Dynamic Workflow

  30. Towards the Wisdom Grid (WG)

  31. WG Architecture Domain Knowledge Agents Knowledge Explorer Agent Wisdom Grid Agent Platform Agent Grid Service KB External Knowledge Base Knowledge Base Service External Services Knowledge Discovery Service Grid End User (personal) Agent

  32. Work-Flow External Agents End User Agent Knowledge Agent Knowledge Explorer Agent Knowledge Base service Knowledge discovery service Agent Service Services ... Knowledge Base

  33. Knowledge Discovery Service • Client for other services • Knowledge Discovery in Databases • GridMiner • data mining • on-line analytical processing (OLAP) • Web Mining • semantic web • Online libraries • Web/Grid Services • Knowledge Explorer Agent

  34. Knowledge Base Service / KB • KBS - Search, Query, Expand Knowledge Base • KB- Database that stores particular data about real objects and relations between these objects and their properties • Consists of ontologies and instances • Information about resources (location, query lang.) • on the Web • web/grid services ,agents • references to the online database • Languages • XML/RDF/DAML-OIL/DAML-S/OWL

  35. Ontology - example DAML-OIL Language: <daml:Class rdf:ID=“Human”> <rdfs:subClassOf> <daml:Restriction cardinality=“1”> <daml:onProperty rdf:resource= “#Age”/> </daml:Restriction> </rdfs:subClassOf> </daml> <daml:DatatypeProperty about:ID=“Age”> <rdf:domain rdf:resource = “#Human”/> </daml:DatatypeProperty> <daml:Class rdf:ID=“Patient”> <daml:subClassOf rdf:resource=“#Human”/> </daml:Class> Patient is Human has Age

  36. Knowledge Base - example has has Temperature Human Value is Patient has has Attribute Tables Database attribute:PAT_ID table:PATIENTS jdbc://foo/hospital

  37. Distributed heterogeneous databases Different database schemas Different query languages Different names of attributes/tables… but the same semantics ! WG enables semantics mediation at a higher level Semantic mediator

  38. Semantic mediator (cont.) AGE PAT_AGE Patient samePropertyAs is Database in Hospital X has Age Human has Blood Type Database in Hospital Z samePropertyAs PAT_BLOOD_TYPE BT

  39. Distributed Knowledge base uri:fooY#Human is subclass Class has property Class property Is same class as uri:fooZ#Temperature uri:fooX#Patient class uri:fooX#Ill_Person

  40. Agent Grid Service • Supports system with ability to communicate with the outside world in standard languages • FIPA Standards • ACL – Agent Communication Language • KQML- Knowledge Query and Manipulation Language • Agent Platform (JADE,FIPA-OS) • Agents • Domain Knowledge Agent • Knowledge Explorer Agent • End-user Agent (personal)

  41. Querying • End-user agent • with own ontology – subset of ontology • Merging of ontologies • without own ontology • Negotiating about domain of interest • Queries created from ontology • Templates <Patient rdf:ID=“ID001”> <Temperature/> </Patient>

  42. Mined Knowledge (GridMiner) Decision trees/ rules (clinical pathways) Association rules Instances of domain ontology Particular data References Links to Web sites Information about another knowledge providers Answers

  43. GridMiner Case Study - Medical Application Semantic Web/Grid Knowledge Explorer Agent Knowledge Agent Knowledge Discovery Service Testset Q: Outcome? + data about patient’s condition resources A: probability of survival + references to the diagnoses Training set Knowledge Base Hospital Databases End User (personal) Agent

  44. Application and extension of the Grid technology to knowledge discovery – an important, but non-traditional Grid application domain Introduction of a new Grid Data Mediation Service Future work Performance evaluation on large synthetic data volumes Coupling of the Data Minining services architecture with the OLAP services architecture Development of a knowledge discovery oriented Grid Workflow Language and the appropriate Workflow Engine Application of GridMiner to a real medical application (management of patients with severe traumatic brain injuries) Development of the Wisdom Grid Conclusions and Future Work

More Related