1 / 53

Prediction Cubes

Prediction Cubes. Bee-Chung Chen, Lei Chen, Yi Lin and Raghu Ramakrishnan University of Wisconsin - Madison. Big Picture. Subset analysis: Use “models” to identify interesting subsets Cube space: Dimension hierarchies

hilbert
Télécharger la présentation

Prediction Cubes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction Cubes Bee-Chung Chen, Lei Chen, Yi Lin and Raghu Ramakrishnan University of Wisconsin - Madison

  2. Big Picture • Subset analysis: Use “models” to identify interesting subsets • Cube space: Dimension hierarchies • Combination of dimension-attribute values defines a candidate subset (like regular OLAP) • Measure of interest for a subset • Decision/prediction behavior within the subset • Big difference from regular OLAP!!

  3. 04 03 … CA 100 90 … USA 80 90 … … … … … 2004 … Jan … Dec … Roll up CA AB 20 15 15 … Drill down … 5 2 20 … 2004 2003 … YT 5 3 15 … Jan … Dec Jan … Dec … USA AL 55 3 10 … CA 30 20 50 25 30 … … … 5 12 7 USA 70 5 5 10 … … … WY 10 5 3 … … … … … … … … … … … … … … … Cell value: Number of loan applications Example (1/5): Traditional OLAP Goal: Look for patterns of unusually high numbers of applications

  4. Data table D Location Time Race Sex … Approval AL, USA Dec, 04 White M … Yes … … … … … … WY, USA Dec, 04 Black F … No Model E.g., decision tree Location Time Example (2/5): Decision Analysis Goal: Analyze a bank’s loan approval process w.r.t. two dimensions: Location and Time

  5. Example (3/5): Questions of Interest • Goal: Analyze a bank’s loan decision process with respect to two dimensions: Location and Time • Target: Find discriminatory loan decision • Questions: • Are there locations and times when the decision making was similar to a set of discriminatory decision examples (or similar to a given discriminatory decision model)? • Are there locations and times during which approvals depended highly on Race or Sex?

  6. Model E.g., decision tree Example (4/5): Prediction Cube • Build a model using data from WI in Dec., 1985 • Evaluate that model • Measure in a cell: • Accuracy of the model • Predictiveness of Race • measured based on that • model • Similarity between that • model and a given model Data

  7. Roll up 04 03 … CA 0.3 0.2 … USA 0.2 0.3 … … … … … 2004 2003 … Jan … Dec Jan … Dec … CA AB 0.4 0.2 0.1 0.1 0.2 … … … 0.1 0.1 0.3 0.3 … … … YT 0.3 0.2 0.1 0.2 … … … USA AL 0.2 0.1 0.2 … … … … Drill down … 0.3 0.1 0.1 … … … WY 0.9 0.7 0.8 … … … … … … … … … … … … … Example (4/5): Prediction Cube Cell value: Predictiveness of Race

  8. Data table D Location Time Race Sex … Approval AL, USA Dec, 04 White M … Yes … … … … … … WY, USA Dec, 04 Black F … No Model-Based Subset Analysis • Given: A data table D with schema [Z, X, Y] • Z: Dimension attributes, e.g., {Location, Time} • X: Predictor attributes, e.g., {Race, Sex, …} • Y: Class-label attribute, e.g., Approval

  9. Location Time Race Sex … Approval AL, USA Dec, 04 White M … Yes … … … … … … [USA, Dec 04](D) WY, USA Dec, 04 Black F … No Model-Based Subset Analysis X: Predictor Y: Class Z: Dimension • Goal: To understand the relationship between X and Y on different subsets Z(D) of data D • Relationship: p(Y | X, Z(D)) • Approach: • Build model h(X; Z(D))  p(Y | X, Z(D)) • Evaluate h(X; Z(D)) • Accuracy, model similarity, predictiveness

  10. Outline • Motivating example • Definition of prediction cubes • Efficient prediction cube materialization • Experimental results • Conclusion

  11. Prediction Cubes • User interface: OLAP data cubes • Dimensions, hierarchies, roll up and drill down • Values in the cells: • Accuracy • Similarity • Predictiveness

  12. Prediction Cubes • Three types of prediction cubes: • Test-set accuracy cube • Model-similarity cube • Predictiveness cube

  13. Data table D Location Time Race Sex … Approval AL, USA Dec, 04 White M … Yes … … … … … … WY, USA Dec, 04 Black F … No 2004 2003 … Jan … Dec Jan … Dec … CA 0.4 0.2 0.3 0.6 0.5 … … Build a model USA 0.2 0.3 0.9 … … … Accuracy Prediction … … … … … … … … Level: [Country, Month] Race Sex … Approval Yes White F … Yes … … … … … Yes Black M … No Test set  Test-Set Accuracy Cube Given: - Data table D - Test set  The decision model of USA during Dec 04 has high accuracy when applied to 

  14. Data table D Location Time Race Sex … Approval AL, USA Dec, 04 White M … Yes … … … … … … WY, USA Dec, 04 Black F … No 2004 2003 … Jan … Dec Jan … Dec … CA 0.4 0.2 0.3 0.6 0.5 … … Build a model USA 0.2 0.3 0.9 … … … … … … … … … … … Similarity Race Sex … Level: [Country, Month] Yes Yes White F … … … … … … No Yes Black M … Test set  h0(X) Model-Similarity Cube Given: - Data table D - Model h0(X) - Test set  w/o labels The loan decision process in USA during Dec 04 is similar to a discriminatory decision model

  15. Predictiveness Cube Given: - Data table D - Attributes V - Test set  w/o labels Data table D Yes No . . Yes Yes No . . No Build models h(XV) h(X) Level: [Country, Month] Predictiveness of V Race is an important factor of loan approval decision in USA during Dec 04 Test set 

  16. Outline • Motivating example • Definition of prediction cubes • Efficient prediction cube materialization • Experimental results • Conclusion

  17. Roll up 04 03 … CA 0.3 0.2 … USA 0.2 0.3 … … … … … 2004 2003 … Jan … Dec Jan … Dec … CA AB 0.4 0.2 0.1 0.1 0.2 … … … 0.1 0.1 0.3 0.3 … … … YT 0.3 0.2 0.1 0.2 … … … Drill down USA AL 0.2 0.1 0.2 … … … … … 0.3 0.1 0.1 … … … WY 0.9 0.2 0.1 … … … … … … … … … … … … … Roll Up and Drill Down

  18. Full Materialization Full Materialization Table [All, Year] [All, All] [Country, Year] [Country, All]

  19. Bottom-Up Data Cube Computation Cell Values: Numbers of loan applications

  20. Functions on Sets • Bottom-up computable functions: Functions that can be computed using only summary information • Distributive function: (X) = F({(X1), …, (Xn)}) • X = X1 …  Xn and Xi  Xj = • E.g., Count(X) = Sum({Count(X1), …, Count(Xn)}) • Algebraic function: (X) = F({G(X1), …, G(Xn)}) • G(Xi) returns a length-fixed vector of values • E.g., Avg(X) = F({G(X1), …, G(Xn)}) • G(Xi) = [Sum(Xi), Count(Xi)] • F({[s1, c1], …, [sn, cn]}) = Sum({si}) / Sum({ci})

  21. h(X; S) Location Time Race Sex … Approval AL, USA Dec, 85 White M … Yes S … … … … … … WY, USA Dec, 85 Black F … No [Yes: 80%, No: 20%] Scoring Function • Conceptually, a machine-learning model h(X; S) is a scoring function Score(y, x; S) that gives each class y a score on test example x • h(x; S) = argmax y Score(y, x; S) • Score(y, x; S)  p(y | x, S) • S: A set of training examples x

  22. Bottom-up Score Computation • Key observations: • Observation 1: Having the scores for each test example is sufficient to compute the value of a cell • Details depend on what each cell means (i.e., type of prediction cubes); but straightforward • Observation 2: Fixing the class label y and test example x, Score(y, x; S) is a function of the set S of training examples; if it is distributive or algebraic, the data cube bottom-up technique can be directly applied

  23. Algorithm • Input: The dataset D and test set  • For each finest-grained cell, which contains data bi(D): • Build a model on bi(D) • For each x  and y, compute: • Score(y, x; bi(D)), if distributive • G(y, x; bi(D)), if algebraic • Use standard data cube computation technique to compute the scores in a bottom-up manner (by Observation 2) • Compute the cell values using the scores (by Observation 1)

  24. Machine-Learning Models • Naïve Bayes: • Scoring function: algebraic • Kernel-density-based classifier: • Scoring function: distributive • Decision tree, random forest: • Neither distributive, nor algebraic • PBE: Probability-based ensemble (new) • To make any machine-learning model distributive • Approximation

  25. Probability-Based Ensemble PBE version of decision tree on [WA, 85] Decision tree on [WA, 85] Decision tree trained on a finest-grained cell

  26. Outline • Motivating example • Definition of prediction cubes • Efficient prediction cube materialization • Experimental results • Conclusion

  27. 1985 1985 … … WA … WA … … … … … Experiments • Quality of PBE on 8 UCI datasets • The quality of the PBE version of a model is slightly worse (0 ~ 6%) than the quality of the model trained directly on the whole training data. • Efficiency of the bottom-up score computation technique • Case study on demographic data PBE vs.

  28. Efficiency of the Bottom-up Score Computation • Machine-learning models: • J48: J48 decision tree • RF: Random forest • NB: Naïve Bayes • KDC: Kernel-density-based classifier • Bottom-up method vs. Exhaustive method •  PBE-J48 • PBE-RF • NB • KDC •  J48ex • RFex • NBex • KDCex

  29. Synthetic Dataset • Dimensions: Z1, Z2 and Z3. • Decision rule: Z1 and Z2 Z3

  30. Efficiency Comparison Using exhaustive method Execution Time (sec) Using bottom-up score computation # of Records

  31. Conclusion • Exploratory data analysis paradigm: • Models built on subsets • Subsets defined by dimension hierarchies • Meaningful subsets • Precomputation • Interactive analysis

  32. Questions

  33. Test-Set-Based Model Evaluation • Given a set-aside test set  of schema [X, Y]: • Accuracy of h(X): • The percentage of  that are correctly classified • Similarity between h1(X) and h2(X): • The percentage of  that are given the same class labels by h1(X) and h2(X) • Predictiveness of V X: (based on h(X)) • The difference between h(X) and h(XV) measured by ; i.e., the percentage of  that are predicted differently by h(X) and h(XV)

  34. Model Accuracy • Test-set accuracy (TS-accuracy): • Given a set-aside test set  with schema [X, Y], • ||: The number of examples in  • I() = 1 if  is true; otherwise, I() = 0 • Alternative: Cross-validation accuracy • This will not be discussed further!! accuracy(h(X; D) | ) =

  35. Model Similarity • Prediction similarity (or distance): • Given a set-aside test set  with schema X: • Similarity between ph1(Y | X) and ph2(Y | X): • phi(Y | X): Class-probability estimated by hi(X) similarity(h1(X), h2(X)) = distance(h1(X), h2(X)) = 1 – similarity(h1(X), h2(X)) KL-distance =

  36. Attribute Predictiveness • Predictiveness of V X: (based on h(X)) • PD-predictiveness: • KL-predictiveness: • Alternative: accuracy(h(X)) – accuracy(h(X – V)) • This will not be discussed further!! distance(h(X), h(X – V)) KL-distance(h(X), h(X – V))

  37. Target Patterns • Find subset (D) such that h(X; (D)) has high prediction accuracy on a test set • E.g., The loan decision process in 2003’s WI is similar to a set  of discriminatory decision examples • Find subset (D) such that h(X; (D)) is similar to a given model h0(X) • E.g., The loan decision process in 2003’s WI is similar to a discriminatory decision model h0(X) • Find subset (D) such that V is predictive on (D) • E.g., Race is an important factor of loan approval decision in 2003’s WI

  38. Test-Set Accuracy • We would like to discover: • The loan decision process in 2003’s WI is similar to a set of problematic decision examples • Given: • Data table D: The loan decision dataset • Test set : The set of problematic decision examples • Goal: • Find subset Loc,Time(D) such that h(X; Loc,Time(D)) has high prediction accuracy on 

  39. Model Similarity • We would like to discover: • The loan decision process in 2003’s WI is similar to a problematic decision model • Given: • Data table D: The loan decision dataset • Model h0(X): The problematic decision model • Goal: • Find subset Loc,Time(D) such that h(X; Loc,Time(D)) is similar to h0(X)

  40. Attribute Predictiveness • We would like to discover: • Race is an important factor of loan approval decision in 2003’s WI • Given: • Data table D: The loan decision dataset • Attribute V of interest: Race • Goal: • Find subset Loc,Time(D) such that h(X; Loc,Time(D)) is very different to h(X – V; Loc,Time(D))

  41. Dimension and Level

  42. Example: Full Materialization [All, All] [City, Month]

  43. Bottom-Up Score Computation • Base cells: The finest-grained cells in a cube • Base subsets bi(D): The finest-grained data subsets • The subset of data records in a base cell is a base subset • Properties: • D =ibi(D) and bi(D)  bj(D) =  • Any subset S(D) of D that corresponds to a cube cell is the union of some base subsets • Notation: • S(D) = bi(D)  bj(D)  bk(D), where S = {i, j, k}

  44. Bottom-Up Score Computation Domain Lattice Scores: Score(y, x;S(D)) = F({Score(y, x;bi(D)) : i  S}) Data subset: S(D) = iSbi(D)

  45. Decomposable Scoring Function • Let S(D) = iSbi(D). • bi(D) is a base (finest-grained) subset • Distributively decomposable scoring function: • Score(y, x;S(D)) = F({Score(y, x;bi(D)) : i  S}) • F is an distributive aggregate function • Algebraically decomposable scoring function: • Score(y, x;S(D)) = F({G(y, x;bi(D)) : i  S}) • F is an algebraic aggregate function • G(y, x; bi(D)) returns a length-fixed vector of values

  46. Probability-Based Ensemble • Scoring function: • h(y | x; bi(D)): Model h’s estimation of p(y | x, bi(D)) • g(bi | x): A model that predicts the probability that x belongs to base subset bi(D)

  47. Optimality of PBE • ScorePBE(y, x; S(D)) = c  p(y | x, xS(D)) [bi(D)’s partitions S(D)]

  48. Efficiency Comparison

  49. Where the Time Spend on

  50. Accuracy of PBE • Goal: • To compare PBE with the gold standard • PBE: A set of J48s/RFs each of which is trained on a small partition of the whole dataset • Gold standard: A J48/RF trained on the whole data • To understand how the number of base classifiers in a PBE affects the accuracy of the PBE • Datasets: • Eight UCI datasets

More Related