1 / 71

Data Mining in Spatial Data Sets

Data Mining in Spatial Data Sets. Hemant Kumar Jerath, B.Tech. MS Project Student Mangalore University Advisors: Dr. B.K Mohan & Dr.(Mrs.).P. Venkatachalam CSRE, IIT Bombay. Contents. Data Management System Data Mining- Concepts, Algorithms & Tasks Data Warehouse

gwidon
Télécharger la présentation

Data Mining in Spatial Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining in Spatial Data Sets Hemant Kumar Jerath,B.Tech. MS Project Student Mangalore University Advisors: Dr. B.K Mohan & Dr.(Mrs.).P. Venkatachalam CSRE, IIT Bombay

  2. Contents • Data Management System • Data Mining-Concepts, Algorithms & Tasks • Data Warehouse • OLAP(On-line Analytical Processing) • Knowledge Discovery Process • Spatial Data Warehouse & OLAP • Spatial Data Mining – Concept & Definition • Case Studies - Data Mining Software • Spatial Data Mining- Software Architecture

  3. OUTPUT/Knowledge Explicit/Trivial Knowledge SQL QUERY INTERFACE OLAP Data Base Management System Data warehouse

  4. Data Mining techniques has an answer to explore the implicit knowledge. DBMS Vs. Data Mining? DBMS: sql driven exploration Data Mining: automatic exploration

  5. Data Mining Definition: Data Mining is analysis of (often large) observational data sets to find implicit relationships and to summarize the data in a novel ways that are both understandable and useful to the data owner.[Hand, et al]

  6. Keywords in Definition • the large data sets • observational data:opposed to the experimental data • relationship and summaries- referred as model and patterns • e.g. linear equations, rules, clusters, graphs, tree structures and recurrent patterns in the time series.

  7. Data Tombs DATA MINING Golden Nuggets Implicit Knowledge Transform your data to critical knowledge

  8. Data Mining – A CONFLUENCE of multi disciplines • Machine Learning • Statistics Data Mining Artificial Intelligence Information Theory

  9. Knowledge Discovery Process(KDD) Phase of real discovery

  10. Data Preprocessing • Data Cleaning • Missing values • Noisy data • Binning • Clustering • Combined computer and human interaction • Regression • Inconsistent data • Data Integration and Transformation • Data Integration • Data Transformation

  11. …Continued • Data Transformation • Smoothening • Aggregation • Generalization • Normalization • Attribute Construction • Data Reduction • Data Cube aggregation • Dimension reduction • Data Compression • Numerosity reduction • Discretization and concept hierarchy generation

  12. Data Warehouse Definition: A data warehouse is a subject oriented Integrated (heterogeneous sources) time variant and non-volatile collection of data in support of management decision making process [W.H.Inmon]

  13. STAR SCHEMA

  14. SNOWFLAKE SCHEMA

  15. Data Cube Technology [address, time, item] cell<Canada, Q1, TV>

  16. OLAP Operations • Roll Up(Drill-up): summarize data climbs up hierarchy or by dimension reduction • Drill Down(roll down): reserve of roll-up from higher level summary to lower summary or detailed data or introducing new dimensions • Slice and dice: project and select • Pivot(rotate): reorient the cube, visualization, 3D to series of 2D planes • Other operations drill across: involving(across) more than one fact table drill through: through the bottom level of the cube to its back-end relational tables(using SQL)

  17. Drill Down Operation Roll Up Operation

  18. Mining technology today Preprocessing utilities Mining operations Data warehouse Extract data via ODBC • Sampling • Attribute transformation Visualization Tools • Vendors • (IDC 1999) • SAS: 29% • SPSS: 13.5% • IBM: 6% • Scalable algorithms • association • classification • clustering • sequence mining

  19. Data Mining Algorithms Definition: A data mining algorithm is a well-defined procedure that takes data as input and produces output in the form of models or patterns.

  20. Data Mining Algorithms Reductionist approach: A data mining algorithm can be thought of as a 'tuple' consisting of: {model structure, score function, search method, data management techniques}

  21. * Data Management Technique

  22. So eventually, we can generate potentially infinite number of algorithms by combining different; • model structure • score function • search methods • and data management techniques

  23. Data Mining Task-Taxonomy • Prediction: use of some variables to predict own known or future values of variables • Classification, regression and deviation detection • Description: Find human interpretable patterns that describe the data • Clustering, association rule discovery, sequential rule discovery

  24. Data Mining Task-Taxonomy • Verification Model: affirm or negate the hypothesis( an iterative process, progressing refinement of hypothesis) • Discovery Driven Model: system automatically finds the information

  25. Classification Regression Classification trees Neural networks Bayesian learning Nearest neighbor Radial basis functions Support vector machines Meta learning methods Bagging,boosting Clustering hierarchical EM density based Mining operations Sequence mining • Time series similarity • Temporal patterns Item set mining • Association rules • Causality Sequential classification • Graphical models • Hidden Markov Models

  26. Mining Tasks • Discovery of Association rule X=>Y(s%,c%) S- support C- confidence

  27. ......Continued Clustering Criteria: i. Available similarity ii. Set function (optimizing technique) Land-use: Finding the similar areas under the land use in a earth observation database City-Planning: Identifying a group of houses according to their house type, value and geographic location

  28. ......Continued • Classification • Finding rules to partition data into disjoint groups

  29. Classification • Given old data about customers and payments, predict new applicant’s loan eligibility. Previous customers Classifier Decision rules Age Salary Profession Location Customer type Salary > 5 L Good/ bad Prof. = Exec New applicant’s data

  30. Classification Vs Clustering • Clustering: methods generate the class labels. [descriptive] • Classification: allocation of class labels to the data thru classifier.[predictive]

  31. Frequent Episodes • Sequence of events occur frequently • these mainly used for the temporal data.

  32. Deviation detection • Identification of outliers

  33. Sequence Mining • Sequence of occurrence of the associative rules.

  34. Spatial Data Mining

  35. Spatial Data Mining Definition: Spatial data mining is an extraction of implicit knowledge, spatial relationships, or other interesting patterns not explicitly stored in the databases.

  36. What is the difference between Data Mining and spatial data mining? • Data Mining: • non-spatial attribute • Spatial Data Mining: • Integration of both spatial and non-spatial dimension in various KDD algorithms • Spatial attribute (use of thematic maps) • Non-spatial attribute (relational database)

  37. Spatial Data Models • Raster Model: pixel data sets • Vector Model: point, line, polygon objects

  38. Fundamental Operations used to vector data sets • Spatial Relations with neighbors is an imp. Aspect of Spatial Data Mining • distance between the points • area of the object (a polygon) • length of the chain or polygon • intersection or the union of the objects • mutual position of objects( they can intersect, overlap or touch)

  39. DATA MINING ARC SDE SOLAP SPATIAL AND NON-SPATIAL DATAWAREHOUSE Attribute data Shape files

  40. Spatial Warehouse and OLAP Definition: The Spatial Data Warehouse is a subject oriented, integrated, time variant and non-volatile collection of both spatial and non-spatial data in support of managements decision making process.

  41. SOLAP and SDW-Issues • Spatial Data format • Structure specific • Vendor specific • OLAP processing • Spatial indexing • Accessing methods

  42. Construction of Spatial Warehouse and OLAP • Spatial data Cube Model • Use of spatial dimensions in the cube. • Star/Snowflake Model

  43. Star Model of a spatial data warehouse: BC_weather

  44. Agriculture Cash Crop Grains vegetation Rice wheat Fruits mango kiwi Kale tomato jasmine basmati Concept Hierarchies

  45. The hierarchy of topological relations G_close_to Not_disjoint Close_to Intersects Inside Contains Equal intersects Adjacent_to covers contains

  46. Modeling dimension-Spatial Data Cube • Non-spatial Dimension • temp. , precipitation with generalization hot, wet • Spatial to Non-Spatial • pacific_northwest, big_state • Spatial to Spatial dimension

  47. What we can measure in spatial data cube? • Numerical measure • e.g monthly revenue of the region, and roll up may get total revenue of the region • Spatial Measure • collection of pointers to the spatial objects • generalization (roll-up), regions of the same temperature and precipitation are grouped together.

  48. Spatial Data Mining: A Database ApproachMartin Ester, Hans-Peter Kriegel, Jorg Sander • Step I: Discover centers based on some non-spatial attribute[clustering-descriptive mining] • Step II: determine the (theoretical) trend of some non-spatial attribute. • Step III: discover the deviation of the theoretical trends • Step IV: explain the deviation by the spatial object, e.g. may be presence of some infrastructure.

More Related