1 / 16

Spatial-enabled Mining in Oracle

Spatial-enabled Mining in Oracle. Ravi Kothuri Spatial Technologies Oracle USA. Oracle10g Spatial. Oracle Spatial: Store, Analyze and Visualize Spatial Data. Spatial Data Types. Mapviewer. Vector ( feature/topological ), Raster, Network types, Versioning. Spatial Relationships

theo
Télécharger la présentation

Spatial-enabled Mining in Oracle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial-enabled Mining in Oracle Ravi Kothuri Spatial Technologies Oracle USA

  2. Oracle10g Spatial Oracle Spatial: Store, Analyze and Visualize Spatial Data Spatial Data Types Mapviewer Vector (feature/topological), Raster, Network types, Versioning • Spatial Relationships • Route Computation • Raster Manipulation Visualization Scalability & Seamless Integration for Spatial Data

  3. Oracle Spatial: Future Projects • 3-D • Extensions to SDO_GEOMETRY • Composite Surface and Composite/Multi-Solid • Support different operators: Anyinteract, Filter, NN, Within_distance • Scalable Storage and Management of PointCloud Data: Partitioning and Visibility Query (LOD) • TIN generation: need to experiment with variety of approaches • Intelligent Map Caching, WFS,…

  4. Oracle Data Mining • Preprocessing, data clean up: number of transformations, normalization functions • Binning, Spatial Binning,… • Data Mining Functions: • Classification: Decision Trees, Adaptive Bayes,… • Clustering: KMeans, KModes, Oracle-specific • Spatial: BIRCH+Agglomerative Clustering • Association Rules: Apriori • Regression: • SVM with linear kernel and more… Robust Framework for Mining Data in Oracle

  5. Spatial Data Mining • Where result patterns have a spatial component • Clustering • Colocation of data items • Spatial-enabled: Include Spatial Info in Data Mining • Information is implicit (not materialized) • What information to materialize? • Spatial correlation with target data (e.g., habitats of birds) • Spatial auto-correlation in Regression • Target Variable Y = a .X + p W Y • Where p is the spatial autocorrelation and W is neighborhood matrix • First step: materialize target variable estimates • How to incorporate spatial auto-correlation • Materialize spatial information, estimates as additional attributes

  6. Materializing Neighborhood Influence • Compute a weighted-sum of interesting information (target variable, other attributes) from neighbors • E.g., if you are estimating CRIME for a region/point T take a “distance-based” weighted sum of crime of neighbors. • Additionally, you can also estimate population-in-10mile radius (based on race) etc. • Oracle Spatial provides specific functions to compute such neighborhood-based estimates A T B C(T) = C(A)/d(A,T) + C(B)/d(B, T) (1/d(A, T) + 1/d(B, T) )

  7. Spatial-enabled Mining Table e.g. population in 2-miles, Crime in neighborhood,… Neighborhood Estimates Augmented Table Oracle Data Mining Mining Results

  8. Spatial-enabled Mining Mapviewer ODM applications Classification, Regression, Association Rules,… Spatial Analysis (building blocks) Spatial Binning, Spatial Estimates, Clustering for polygons (BIRCH+agglomerative)

  9. Case Study for Spatial-enabled Mining: How helpful are these estimates? • Test on a specific dataset • US Block groups from Census for CA (21K) • Crime Data for US Blockgroups (from a partner company) • Crimerate is number of crimes per 1000 of population • Separate the data into TRAINING data and TEST data • Compute Data Mining models using TRAINING data

  10. Evaluation • Predict Crime for TEST regions with and without spatial estimates using ODM’s Mining functions • Test Regions: 450 locations in San Francisco area • Classification (Adaptive Bayes Network) • Create Bins or “classes” of the data and results • So how well the model predicts the “class” for new test regions • Regression (Support Vector Machines) • Predict the exact value of Regression analysis using SVM crimerate • Estimates for spatial neighborhood

  11. Spatial Neighborhood • How do you define neighborhood? • Buffer around test location? Quarter-mile, to 10 mile • Nearest-neighbors? 2 to 20 • Compute spatial estimates for crime, • Can also be done for population (white, asian, black, hispanic,..)

  12. Some Results: • Classification: • Accuracy increases from 62% to 89% with 7 nearest neighbors • Regression: • Root-Mean-Square-Error between predicted and actual value improves from ~25 to 8 (5-7 Neighbors) • Detailed results in a white paper on http://technet.oracle.com/products/spatial • Visualize the results with Mapviewer

  13. Summary of the case study • Adding Neighborhood Influence to Data • Improves classification accuracy from 62% to 89% • Best Neighborhood for this case study: 5-7 neighbors or 2-mile distance • Details, Additions: White paper on OTN • http://technet.oracle.com/products/spatial • Recommendation for Businesses : Spatial-enable the data • Always geocode customer/business locations • Materialize demographic information from spatial neighborhood • Test the data and perform mining tasks

  14. More research needed… • Current case study: • SVM w/o spatial, although worse than with spatial, is still good: Which attributes are helping? • Colocation Mining • “Co-location” of items as opposed to “co-occurrence” in a transaction • E.g., which sets of items are colocated and what are the implications (interesting patterns) • One approach: identify items that co-occur within “tiled” regions • Needs tighter integration with association rule mining

More Related