1 / 20

Patterns and Best Practices in SSIS

Patterns and Best Practices in SSIS. or, how to keep your DBA happy with your crazy-ass ETL. It’s all about me. I’m a SpeakingMentor !. Obligatory LOTR reference…. Mantra. C# is *not* the only way to initialise a variable C# is *not* the only way to move files

smireles
Télécharger la présentation

Patterns and Best Practices in SSIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Patterns and Best Practices in SSIS or, how to keep your DBA happy with your crazy-ass ETL

  2. It’s all about me... I’m a SpeakingMentor !

  3. Obligatory LOTR reference…

  4. Mantra • C# is *not* the only way to initialise a variable • C# is *not* the only way to move files • C# is *not* the only way to call a web service • C# scripts are opaque to the SSIS runtime • C#... <sigh>

  5. Not a pipeline • Data does not get passed to components (cough) • Components manipulate blocks of data (true)

  6. What you *think* happens…. Get data Replace nulls Conditional Split Merge Sort

  7. What actually happens… Get data Conditional Split Replace nulls Sort Merge

  8. SSIS not being a pipeline

  9. It’s all about speed… • There are 2 transformation types: • Synchronous – fastest (streaming and row-based) • Asynchronous – slower • And three ‘modes’: • Non-blocking • Semi-blocking • Full blocking

  10. Non-blocking synchronous streaming transforms • Audit • Cache Transform • Character Map • Conditional Split • Copy Column • Data Conversion • Derived Column • Multicast • Percent Sampling • Row Count • Lookup

  11. Non-blocking synchronous row transforms • DQS cleansing • Export Column • Import Column • OLEDB Command • Script Task • SCD • Lookup

  12. Wait, there’s two ‘Lookup’s ? • Lookups are non-blocking streaming transforms when the ‘Full Cache’ option is used for the lookup data • Using ‘Partial Cache’ or ‘No Cache’ options in the lookup make the Lookup a row-based transform, which is necessarily slower

  13. Semi-blocking asynchronous transforms • Data Mining • Merge • Merge Join • Pivot • Unpivot • Union All • Term Lookup

  14. Full blocking asynchronous transforms • Aggregate • Fuzzy Grouping • Fuzzy Lookup • Row Sampling • Sort • Term Extraction • Script Task

  15. Wait, there’s two ‘Script Task’s ?? • Script tasks are non-blocking when they’re using an outside resource (i.e. not the data that you’re working on) • They become blocking when they collect a dataset before sending it on to a destination • Set the ‘SynchronousInputID’ property on the output columns to ‘None’

  16. Large Data Sets • BufferTempStoragePath • BlobTempStoragePath • These can either be set in your package template, or injected into the .dtsx

  17. Really ? You can just inject stuff ? • Yes • Find and Replace default entries with your custom requirement… any decent text editor will do. BLOBTempStoragePath="F:\astDisk\WithLoadsOfSpace_temp" bufferTempStoragePath="F:\astDisk\WithLoadsOfSpace_temp”

  18. Things that make you go hmmm • Logging • Package Checkpoints • Location, location, location • Expressions • Parallel Operations • ‘IsSorted=True’ property • Auditing • Bad Data

  19. Time to dissect Here’s one I prepared earlier…

  20. Take-aways – You should know.. • Your data • Your environment • Transform types • Ingest / Egress of data • What you are trying to achieve • When to give up and use C#

More Related