1 / 62

Database Implementation of a Model-Free Classifier

University of Athens. ADBIS 2007. Database Implementation of a Model-Free Classifier. Konstantinos Morfonios. Introduction. Motivation. LOCUS. Parallel Execution. Experimental Evaluation. Conclusions & Future Work. Introduction. Motivation. LOCUS. Parallel Execution.

jody
Télécharger la présentation

Database Implementation of a Model-Free Classifier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. University of Athens ADBIS 2007 Database Implementation of a Model-Free Classifier Konstantinos Morfonios

  2. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

  3. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

  4. ω1 = ω2 = Introduction Classification x = <x1, x2, …, xD> ω = f(x)

  5. Introduction <x1,1, x1,2, …, x1,D, ω1> <x2,1, x2,2, …, x2,D, ω2> <x3,1, x3,2, …, x3,D, ω1> <x4,1, x4,2, …, x4,D, ω1> . . . x1 = <x1, x2, …, xD> x2 = <x1, x2, …, xD> “Lazy” “Eager” (Nearest Neighbors) (Decision Trees) (+) Faster decisions (-) Large/complex datasets (-) Dynamicdatasets (-) Dynamicmodels

  6. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

  7. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

  8. Motivation • Large/complex datasets

  9. Motivation

  10. Motivation • Large/complex datasets • Dynamic datasets

  11. Motivation

  12. Motivation • Large/complex datasets • Dynamic datasets • Dynamic models

  13. Motivation

  14. Motivation • Large/complex datasets • Dynamic datasets • Dynamic models Lazy (model-free)

  15. Disk-based Motivation • Large/complex datasets • Dynamic datasets • Dynamic models Lazy (model-free) Nearest Neighbors

  16. Motivation LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Suffers from “curse of dimensionality” • Not reliable [Beyer et al., ICDT 1999] • Not indexable [Shaft et al., ICDT 2005] Nearest Neighbors

  17. Motivation LOCUS (Lazy Optimal Classifier of Unlimited Scalability) • Category?

  18. Motivation LOCUS (Lazy Optimal Classifier of Unlimited Scalability) • Lazy

  19. Motivation LOCUS (Lazy Optimal Classifier of Unlimited Scalability) • Lazy • Scaling?

  20. Motivation LOCUS (Lazy Optimal Classifier of Unlimited Scalability) • Lazy • Based on simple SQL queries

  21. Motivation LOCUS (Lazy Optimal Classifier of Unlimited Scalability) • Lazy • Based on simple SQL queries • Accuracy?

  22. Motivation LOCUS (Lazy Optimal Classifier of Unlimited Scalability) • Lazy • Based on simple SQL queries • Converges to optimal Bayes Classifier

  23. Motivation LOCUS (Lazy Optimal Classifier of Unlimited Scalability) • Lazy • Based on simple SQL queries • Converges to optimal Bayes Classifier • Other features?

  24. Motivation LOCUS (Lazy Optimal Classifier of Unlimited Scalability) • Lazy • Based on simple SQL queries • Converges to optimal Bayes Classifier • Parallelizable

  25. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

  26. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

  27. f2 ω1 = x = <f1, f2> (f1 [0, 20], f2 [0, 10]) ω2 = f1 LOCUS Example

  28. f2 f1 LOCUS Ideally: Dense space

  29. LOCUS f2 ω(<7, 4>) = ? Ideally: Dense space f1

  30. LOCUS f2 ω(<7, 4>) = f1

  31. LOCUS f2 • Many features • Large domains  Sparse space Reality: f1

  32. Many features • Large domains  Sparse space Reality: LOCUS f2 ω(<7, 4>) = ? ? f1

  33. LOCUS ω1: 2 f2  ω(<7, 4>) = ? ω2: 1 f1 3-NN

  34. LOCUS ω1: 2 f2  ω(<7, 4>) = ω2: 1 f1 3-NN

  35. LOCUS f2 ω(<7, 4>) = ? f1 LOCUS

  36. LOCUS ω1: 7 f2  ω(<7, 4>) = ? ω2: 3 f1 LOCUS

  37. LOCUS ω1: 7 f2  ω(<7, 4>) = ω2: 3 f1 LOCUS

  38. LOCUS f2 Disk-based implementation f1 LOCUS

  39. 2δ2 2δ1 LOCUS SELECT ω, count(*) FROM R WHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2 GROUP BY ω R(f1, f2, ω) ω1: 7  ω(<7, 4>) = ω2: 3 <x1, x2>

  40. LOCUS SELECT ω, count(*) FROM R WHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2 GROUP BY ω R(f1, f2, ω) What ifR is large? Classical optimization techniques for a well-known type of aggregate queries • Indexing • Materialized views • Presorting

  41. LOCUS SELECT ω, count(*) FROM R WHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2 GROUP BY ω R(f1, f2, ω) Method reliability? LOCUS converges to the optimal Bayes classifier as the size of the dataset increases (proof in the paper)

  42. LOCUS SELECT ω, count(*) FROM R WHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2 GROUP BY ω R(f1, f2, ω) What if a feature, sayf2, is categorical? (e.g. sex)

  43. LOCUS SELECT ω, count(*) FROM R WHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2=x2 GROUP BY ω R(f1, f2, ω) What if a feature, sayf2, is categorical? (e.g. sex) Not a problem, since generally in practice: • Combinations of categorical and numericfeatures • Categorical features have small domains Hence, they do not contribute to sparsity

  44. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

  45. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

  46. R1 R2 R3 R4 SELECT SELECT SELECT SELECT Parallel Execution R = R1 R2 R3 R4

  47. R1 R4 R3 R2 12 3 5 2 18 3 23 4 Parallel Execution Count: distributive function ω1: 23 ω1: 7 ω1: 5 ω2: 4 ω2: 1 ω2: 2 ω1: 6 ω2: 0 ω1: 5 ω2: 1

  48. R1 R4 R3 R2 12 3 5 2 18 3 23 4 ω1: 7 ω1: 5 SELECT SELECT ω2: 1 ω2: 2 ω1: 6 SELECT ω2: 0 SELECT ω1: 5 ω2: 1 Parallel Execution • Small network traffic • Load balancing • Lightweight operations on the main server ω1: 5 ω1: 7 ω2: 2 ω2: 1 ω1: 6 ω2: 0 ω1: 5 ω2: 1

  49. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

  50. Introduction Motivation LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work

More Related