1 / 39

High-Performance Calculations

High-Performance Calculations. Simple tricks to make some Tableau calculations execute hundreds of times faster. PRESENTED BY. Overview. Why do we need fast calculations? Real-life examples: Computing dates from Unix time stamps Computing dates from “yyyymmdd” columns

stacie
Télécharger la présentation

High-Performance Calculations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Performance Calculations Simple tricks to make some Tableau calculations execute hundreds of times faster PRESENTED BY

  2. Overview • Why do we need fast calculations? • Real-life examples: • Computing dates from Unix time stamps • Computing dates from “yyyymmdd” columns • Displaying numbers as text in a viz • Computing a nested set (combined field) • Coming attractions • How version 8 speeds up some common calculations

  3. Your organisation can respond faster You can stay in a “flow” state longer Overview: The Need for Speed

  4. All calculations run against 1 billion rows Amplifies differences to human scale Range is from 6 seconds to 5 hours Overview: One…Billion…Rows!

  5. Computers Compute! A case study in calculation performance

  6. Customer Task: Convert a column of Unix timestamps to dates Timestamps are 64-bit integers Contain the number of milliseconds since 1970-01-01 Need to convert timestamps to dates for analysis Human-style units like years, months, days (“binning”) Unix Times: The Problem

  7. Meaning: Convert the number to a string Take the left 10 characters Change that to an integer (10s of seconds) Divide by 8640 to get days Add to the “zero date” Computing one billion values takes 3 hours and 45 minutes! Version 8 still takes about 30 minutes. DATE("1/1/1970") + INT( INT( LEFT( STR( [unix]),10 ) ) / 8640 ) Unix Times: Original Version

  8. Meaning: Convert the number to seconds by dividing Add those seconds to the zero date Remove the time part Computing one billion values takes 45 seconds That is 13,000x faster than version 7! Still 40x faster than version 8. DATE( DATEADD( 'seconds', INT([unix] / 1000), #1970-01-01# ) ) Unix Times: Numeric Version

  9. Need to look at each character Need to figure out how many characters there are Need to find space for the answer Need to copy each character …and so on Often takes 10-100 instructions per value Unix Times: Strings are Slow

  10. Computers are good at arithmetic They “compute”! Many arithmetic operations take only one instruction 2.66 GHz processor = 2.66 billion instructions / second The more arithmetic you use, the faster they will be Unix Times: Numbers are Fast

  11. These numbers are for the Tableau Data Engine Should also work for most analytic databases: Vertica, ParAccel, VectorWise, etc. May not help on slow databases MySQL, Text Files, Excel If you extract, it should help Unix Times: When Does it Help?

  12. Creating Dates Use date arithmetic instead of string parsing

  13. Customer Task: Convert a column of numbers to dates The numbers are in the form yyyymmdd Need to convert them to dates for analysis Time series Binning (weeks, quarters) Creating Dates: The Problem

  14. Answer taken from an old in-house training manual Meaning: Build a string in “mm/dd/yyyy” format Cast it to a date Problems: One billion values takes 5 hours Only works in the U.S. DATE( MID( STR( [yyyymmdd] ), 4, 2 ) + “/” + RIGHT( STR( [yyyymmdd] ), 2 ) + “/” + LEFT( STR( [yyyymmdd] ), 4 ) ) Creating Dates: U.S.A. Strings

  15. Meaning: Build a string in “yyyy-mm-dd” format Cast it to a date Works in any country Data engine tries this format first One billion values still takes 5 hours DATE( LEFT( STR( [yyyymmdd] ), 4 ) + “-” + MID( STR( [yyyymmdd] ), 4, 2 ) + “-” + RIGHT( STR( [yyyymmdd] ), 2 ) ) Creating Dates: ISO Strings

  16. Meaning: Get date parts using division (/) and remainder (%) Use date arithmetic to add the parts to a date constant Division gives real numbers, INT fixes this. One billion values takes 64 seconds 280x faster This is the difference between stretching your legs and coming back in the morning! DATEADD( 'day', [yyyymmdd] % 100 - 1, DATEADD( 'month', INT( ( [yyyymmdd] % 10000 ) / 100 ) - 1, DATEADD( 'year', INT( [yyyymmdd] / 10000 ) - 1900, #1900-01-01# ) ) ) Creating Dates: Date Arithmetic

  17. The original calculation has four concatenations Each one needs different amounts of memory Each one needs to copy the characters …and so on Changing a number to a string has similar problems Reading dates from text is tricky What country are we in? 5/3/1983 Creating Dates: Strings are Slow

  18. Numbers all have the same size Copying numbers is fast Date arithmetic is still arithmetic Not as simple as addition BUT Still only a few instructions Computers are good at arithmetic! Creating Dates: Numbers are Fast

  19. LOG How many digits are there? ABS / SIGN Remove / extract the sign of the number MIN / MAX Smallest / largest of two values Creating Dates: Useful Numeric Functions

  20. Presenting Numbers Move display formatting out of your calculations

  21. Customer Task: Data only has the day of the quarter User wants to group data by the week of a quarter Weeks should be labeled nicely Need a calculation to convert the day to the week Need to format it for display Presenting Numbers: The Problem

  22. Meaning: Check all 14 possible ranges one after another Label out of bounds values as “Other” Lots of typing means lots of mistakes A billion rows takes several minutes ~7 minutes in 7.0 ~4 minutes in 8.0 IF [Day Of Quarter] <  7 THEN "Week #1" ELSEIF [Day Of Quarter] < 14 THEN "Week #2" … ELSEIF [Day Of Quarter] < 91THEN "Week #13" ELSE "Other" END Presenting Numbers: If Then Else

  23. Solution 1: Use aliases Rewrite calculation to return numbers Create aliases for the values: “Week #1” etc. Only takes 36s on a billion rows 12x faster than 7.0 6x faster in 8.0 Problems: Typing aliases is still error prone Dialogue is slow because Tableau must find all the values INT( [DayOfQuarter] /7 ) + 1 Presenting Numbers: Aliases

  24. Solution 2: Use column formatting Rewrite the calculation to return numbers Apply number formatting to the column: “Week #”0 Still only takes 36s on a billion rows Viz updates live as you edit! Much easier to correct mistakes Formatting editor doesn’t need to run queries INT( [DayOfQuarter] /7 ) + 1 Presenting Numbers: Formatting

  25. Databases can format output for historical reasons Remember teletypes? Line printers? Database formatting has to be done on every row Grouping by string calculations can be much slower than grouping by numbers Need to compare entire strings instead of string identifiers Presenting Numbers: Strings are Slow

  26. Grouping by numbers only compares numbers 10x-100x faster than strings Grouping reduces the number of rows returned Aliases and formatting are applied after the query Changing the formatting in Tableau does not run queries Presenting Numbers: Presentation is Fast

  27. Combined Fields Use Sets instead of concatenated strings

  28. Customer Task: User wants to create a multi-column set from two or more string columns The user may want to change the column separator Combined Fields: The Problem

  29. Taken from in-house training as an alternative to nested fields (called “combined fields” in version 8) Meaning: Concatenate two strings together Use a separator string that can be changed Problems: One billion rows takes almost 9 minutes Changing the separator requires re-running the query [Month] + “, “ + [Weekday] Combined Fields: Concatenation

  30. Using “Combine Fields…” menu item, create a field that shows both fields with a user specified separator “Create Set…” in v7 Changing the separator does not run a new query Performance is extremely fast 6 seconds on one billion rows 90x faster Combined Fields: Set

  31. String concatenation is very hard to make fast Must build all the combinations from every row Grouping by calculated strings is slow Calculations don’t have string identifiers Combined Fields: Strings are Slow

  32. Unmodified string columns are really numbers One number per unique string Grouping by them is like grouping by numbers Tableau formats combined fields after the query Changing the formatting doesn’t run another query Combined Fields: Numbers are Fast

  33. Coming Attractions Some things we have made faster in the version 8 data engine

  34. Version 7 evaluated both sides of IFs Computed the ELSE side even when true Computed the THEN side even when false Especially bad when Tableau nested many of them Fixed! Coming Attractions: If/Then/Else

  35. Version 7 did not have a CASE statement Made us build huge if/then/else statements If the nesting was deep enough, we would crash Fixed! Also computes only the outputs it needs THEN “string” much faster too Coming Attractions: Case

  36. Version 7 strings were computed one at a time Version 8, many functions have been “chunked” Compute 1000 values at a time Converting to/from strings is much faster in Version 8 Coming Attractions: String Functions

  37. Version 7 computed values on only one processor Version 8 tries to spread calculations across processors If you have 4 cores, calculations can be 4x faster Coming Attractions: Parallel Execution

  38. Version 7 could not edit the column order or the separator Version 8 lets you edit the column order and the separator Coming Attractions: Combined Fields

  39. Questions?

More Related