1 / 96

Scorpion Explaining Away Outliers in Aggregate Queries

Scorpion Explaining Away Outliers in Aggregate Queries . eugene wu and sam madden MIT . http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html. Table. Split. Visualize. Aggregate. SELECT sum(cost) FROM expenses GROUPBY country. Expenses. USA. China. Italy.

glyn
Télécharger la présentation

Scorpion Explaining Away Outliers in Aggregate Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ScorpionExplaining Away Outliers in Aggregate Queries eugenewuand sam madden MIT http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html

  2. Table Split Visualize Aggregate

  3. SELECT sum(cost) FROM expenses GROUPBY country Expenses USA China Italy

  4. Expenses USA China Italy SELECT sum(cost) FROM expenses GROUPBY country

  5. Expenses USA China Italy SELECT sum(cost) FROM expenses GROUPBY country

  6. Expenses USA China Italy SELECT sum(cost) FROM expenses GROUPBY country

  7. Given Outlier and normal results Understand Why Expenses USA China Italy SELECT sum(cost) FROM expenses GROUPBY country

  8. Given Outlier and normal results What input properties caused the outliers? most caused the outliers? caused outliers but didn’t affect normal outputs? Expenses USA China Italy SELECT sum(cost) FROM expenses GROUPBY country

  9. Can’t Touch This

  10. Provenance Data!

  11. Provenance SELECT SUM(cost) FROM sam’s bank account $$$

  12. SELECT SUM(cost) FROM sam’s bank account Provenance $$$

  13. Provenance SELECT SUM(cost) FROM sam’s bank account $$$

  14. Provenance SELECT SUM(cost) FROM sam’s bank account $$$ Darn! Ya caught me

  15. Provenance http://weknowmemes.com/2012/04/whats-the-point/

  16. SELECT SUM(cost) FROM sam’s bank account Provenance Filter for “most influential”

  17. Provenance

  18. Provenance Faceting http://www.perceptualedge.com/articles/Whitepapers/Three_Blind_Men.pdf

  19. Provenance Faceting http://www.perceptualedge.com/articles/Whitepapers/Three_Blind_Men.pdf

  20. Provenance Faceting Dimensionality :( Dealing with multiple outliers? http://www.perceptualedge.com/articles/Whitepapers/Three_Blind_Men.pdf

  21. Provenance Faceting

  22. Provenance Faceting Scorpion!

  23. Given Outlier and normal results Understand Why Expenses USA China Italy

  24. Given Outlier and normal results Find Predicates correlated with outliers Expenses USA China Italy Desc = “toilets”

  25. Given Outlier and normal results Removing predicate from inputs “fixes” outliers & maintains normal results Find Predicates correlated with outliers Expenses s.t. USA China Italy Desc = “toilets”

  26. Given Outlier and normal results Find Predicates correlated with outliers Expenses s.t. Removing predicate from inputs “fixes” outliers & maintains normal results USA China Italy Desc = “toilets”

  27. Given Outlier and normal results Removing predicate from inputs “fixes” outliers & maintains normal results Find Predicates correlated with outliers Expenses s.t. USA China Italy Desc = “toilets”

  28. Formalize “influence” as metric Predicate search heuristics Some results

  29. T

  30. T Desc = “toilet” p(T)

  31. T p(T)

  32. T p(T) T – p(T)

  33. p(T)

  34. p(T)

  35. Δoutput p(T)

  36. Δoutput p(T) |p(T)|

  37. Δoutput p(T) |p(T)| Δoutput Influence Metric |p(T)|

  38. Δf(x) Sensitivity Analysis Δx Δoutput Influence Metric |p(T)|

  39. Δoutput |p(T)| ΔOutput “High vs Low” |p(T)| ΔNormal Multiple Outputs

  40. Δoutput |p(T)| ΔOutput “High vs Low” |p(T)| ΔNormal Multiple Outputs ΔoutputV |p(T)|

  41. Δoutput |p(T)| ΔOutput “High vs Low” |p(T)| ΔNormal Multiple Outputs ΔoutputV |p(T)| ΔoutputV |p(T)|c

  42. Δoutput |p(T)| ΔOutput “High vs Low” |p(T)| ΔNormal Multiple Outputs ΔoutputV |p(T)| ΔoutputV |p(T)|c - ΔoutlierV ΔNormal |p(T)|c

  43. Δoutput |p(T)| ΔOutput “High vs Low” |p(T)| ΔNormal Multiple Outputs ΔoutputV |p(T)| ΔoutputV |p(T)|c - ΔoutlierV ΔNormal |p(T)|c - ΔoutlierV max mean ΔNormal |p(T)|c normal outlier

More Related