1 / 45

BIN302 SSIS 2008: Tips and Tricks

BIN302 SSIS 2008: Tips and Tricks. Steve Swartz Group Program Manager Microsoft Corporation. Overview. Before You Begin Relationship to the Database Modeling Performance Development Smart Package Design Understanding the Data Flow Deployment General Issues Tuning and Tweaking.

judah
Télécharger la présentation

BIN302 SSIS 2008: Tips and Tricks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIN302SSIS 2008: Tips and Tricks Steve Swartz Group Program Manager Microsoft Corporation

  2. Overview • Before You Begin • Relationship to the Database • Modeling Performance • Development • Smart Package Design • Understanding the Data Flow • Deployment • General Issues • Tuning and Tweaking

  3. TIPFactor Integration Code Away From App Code

  4. Relation To The Database • Administration Issues • Where Do I Store Configurations? • Where Do I Deploy Packages? • Where Do I Store Log Information? • Architectural Issues • Is Integration Inside the Data Tier? • Is Integration Separate from the Data Tier?

  5. Integration Inside the Data Tier Database Instance Load Database Upgrade Database Synchronize Databases Export Results

  6. Integration Separate From the Data Tier Database Instance Database Instance DataMart Load / Upgrade / Synchronize / Export Database(s)

  7. TIPModel Performance Early In the Design Cycle

  8. Performance Strategy - OVAL

  9. Implementing OVAL • Good Operations == Good package design • Volume, Application, Location call for a scientific approach • Measure • Hypothesize • Modify • Re-Measure

  10. Source -> Row Count Run DTEXEC and measure time taken Volume: Measure Speed per Connection Rows / sec = Row Count / TimeData Flow

  11. Application: The Right Tool for the Job Multiple data sources and destination Transformation requirements Workflow management Trickle feed Real-time ETL Single source and destination server BULK INSERT works just fine

  12. Location: Know Your Environment • Four main factors for performance • CPU • Memory • Disk I/O • Network • Design For Your Environment

  13. Location: Benchmark Package Mockups

  14. Location: Identify Bottlenecks

  15. TIPHappiness is More, Smaller Packages

  16. Think Modular • Some Languages Force Modularity on a Developer • SSIS is not one of those languages • So Modularity is a Discipline • Create a Package For Each Operation • Use Workflow To Run Individual Operations • Test, Do Perf Analysis Per Operation

  17. Think Parallel • You can add parallelism inside and outside of your packages • Perform tasks in parallel when possible • MaxConcurrentExecutables • Default is the number of cores + 2

  18. Load in Parallel • Partition data so it can be split up and loaded by multiple data flows • Load unrelated tables in parallel Customer Part Region Nation Part Supplier Supplier

  19. Think In-Memory • Performance drops drastically anytime you need to land a file on disk • Reasons for unintentionally writing to disk: • Running out of memory • Lookup caches can hog memory • Blocking transforms • Too many active buffers • BLOB columns

  20. TIPKeep The Flow In Data Flow

  21. How Does the Data Flow Work? • In memory ETL engine • Metadata set at design time • Data is moved through the pipeline in buffers

  22. Transforms and Buffer Copies • Logically works row by row • Row Count, Derived Column • Buffer is reused RowBased (synchronous) • Works with groups of rows • Merge, Merge Join, Union All • Data copied to new buffers Partially Blocking (asynchronous) • Needs all input rows before producing any output rows • Aggregate, Sort, Pivot Blocking (asynchronous)

  23. Chose the Right Transforms • Synchronous and Asynchronous • Beware of blocking transforms like Sort and Aggregate • Take advantage of your database engine • ORDER BY … • SELECT INTO …

  24. Threading • Pipeline is split into Execution Trees • Execution trees end where a buffer is created • Sources • Asynchronous transforms • Threading was improved in 2008 • 2005 – one thread per execution tree • 2008 – one thread per buffer

  25. The Data Pipeline Destination Source Transform Multicast Destination

  26. Designing Your Data Flow Orders Customers Sort Sort Merge Join Merge Join Products Sort

  27. TIPKnow Your Deployment OptionsChoose Wisely, Grasshopper

  28. Why Can’t I Share My Package? • Package protection level • Encrypt {all|sensitive} with User Key • Encrypt {all|sensitive} with Password • Don’t Save Sensitive • Server Storage

  29. How Do I Deploy My Packages?

  30. Deployment Utility

  31. Deployment – File System • Deploy files using an MSI, or copied to a share • XML configuration

  32. Deployment – SQL Server • Files are stored in MSDB • Execution typically done with SQL Agent • Make use of Proxy Accounts • Store configurations in SQL Server

  33. Deployment – Package Store • Deploys to the SSIS service • Front end to File and SQL storage

  34. What is the SSIS Service For? • Not required for execution • Tracks running packages • Component cache • Can manage multiple File or SQL locations • Remote connections to the service will require DCOM security changes

  35. TRICKSTuning and Tweaking

  36. Optimize Your Query • SELECT only columns you need • Consider using NOLOCK • Remove locking overhead • Improves the speed of large tables scan • Risky side-effects - can read dirty data

  37. Optimize Lookup Transformation • Optimize your query • Select the right Cache Mode • Use shared cache when it makes sense • Remember Miss Cache for Partial Cache mode • Check log for cache details Lookup processed 150000 rows The processing time was 28.797 seconds The cache used 84000000 bytes of memory

  38. Optimize Data Type Usage • General Tips • Be as narrow as possible: less memory allocation • Avoid excessive casting • Avoid LOB types as they get swapped to disk • Specific Tips • Money is implemented as integer (up to 30% faster than Decimal) • Float faster than Decimal • Throughput varies with destination • Typical values between 3 and 20 MB/sec per connection

  39. Tweak Your Pipeline Settings • Buffer size • Rows per buffer • Engine threads • Blob / Buffer temp path

  40. Optimize Destination • SQL Destination vs. OLEDB Destination • Use OLEDB with Fast Load • Use partitions and partition SWITCH • Dropping non-clustered indexes can improve performance

  41. Optimization: Tuning Results

  42. Call to Action -Join Microsoft and HP NDA’s Discussion Topic- Business Insight Customer Roundtable: An HP/Microsoft Converged Application Technical NDA Discussion Time & Date- Wednesday 7am-8:30am Location-Hotel Concorde. Transport will be provided from the Hotel Concorde to the Messe at 8:30am. Breakfast will be served. Limited space. • How to Register? Register at the HP booth P2 or on http://tinyurl.com/2a9w64m

  43. Session Evaluations Tell us what you think, and you could win! All evaluations submitted are automatically entered into a daily prize draw*  Sign-in to the Schedule Builder at http://europe.msteched.com/topic/list/ * Details of prize draw rules can be obtained from the Information Desk.

  44. © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related