1 / 46

Data Integration at Microsoft: Technologies and Solution Patterns

SESSION CODE: BIE202. Data Integration at Microsoft: Technologies and Solution Patterns. Jeff Bernhardt SSIS Product Unit Manager Microsoft Corporation. An Ancient Story from India …. A Modern Story from Redmond…. Integration Services. BizTalk Server. SQL Server Service Broker.

kipling
Télécharger la présentation

Data Integration at Microsoft: Technologies and Solution Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SESSION CODE: BIE202 Data Integration at Microsoft:Technologies and Solution Patterns Jeff Bernhardt SSIS Product Unit Manager Microsoft Corporation

  2. An Ancient Story from India …

  3. A Modern Story from Redmond… Integration Services BizTalk Server SQL Server Service Broker SQL Server Replication Others …

  4. Data Integration: Two Globs of Ideas, Why and How Replication / Synchronization Data Management Streams - CEP Business Intelligence SOA Federated Views Data Warehousing Migration Message Oriented ETL – Bulk Move CDC Operational Consistency Data Quality Consolidation Master Data B2B Technology Choices Problem/Solution types

  5. DI Problem Types: Building a DW and Business Intelligence Text Files Reports My DW ETL My ODS or OLTP System • Alter the shape • Create a Star Schema (de-nomalized for analysis queries) • Surrogate Keys (in place of business keys) • Pre-Aggregations (to support some types of reporting) • Track History • Slowly Changing Dimensions (history of entities) • Manage Partitions (once a month, roll up details and archive) • Take changes from the store • React to Inserts/Updates/Deletes. • Could be a “full refresh” or incremental

  6. DI Problem Types: Data Consistency Between Applications New Custom AR system in SQL Old Accounts Receivable on SAP Create Consistency Create Consistency • A long running ‘bridge’ • Existing systems will be left in place and kept in synch. • Reacts to changes in either system. • Needs a way to react to changes or messages to minimize tax on App systems • The systems are different • Often different back ends. • Match schemas, tables, columns • Consistent data domains (like keys) • Detect and resolve duplicates • Create a consistent level of granularity • Aggregate • Allocate

  7. DI Problem Types: Migration or Consolidation New Custom AR system in SQL Once the design is set and tested, execute this Old Accounts Receivable on SAP Transfer all data and map the shape • Systems or Companies merged or acquired. • Bring the data together into the “new” place. • An integration system is design and built and tested to minimize the down time for the old system and make one smooth transition. • Match schemas • Consistent data domains (like keys) • Detect and resolve duplicates • May create a long running ‘bridge’ while the systems settle.

  8. DI Problem Types: Master Data Management Customers Support Customers Marketing Customers Sales Customers Accounting Customers • Creating ‘One Version of the Truth’ • Data residing in many sources where each source schema is fixed but different. Combined into one store with a consistent schema • Pivot / Unpivot • Type and domain mapping • Key generation • Ensure quality • Remove duplicates • Provide missing data • Hard matching to find duplicates • Bulk update and trickle changes • Changes to central store delivered back to operational system

  9. DI Problem Types: B2B (Inter-Enterprise Data Sharing) PartsAreUs EZ Buy Orders Internet / WAN Supplier’s System Order Fulfillment System WeShip Shipper’s System • Contracts • SLAs • Standardized formats • Long running transactions or business process • Loosely coupled • Coordination, message passing • A very specific perspective on Application Integration.

  10. Organizing the Problem/Solution Ideas • Data Warehouse and Business Intelligence • Data Consistency Between Applications • Data System Migration and Consolidation • Master Data Management • Inter Enterprise Data Acquisition and Sharing

  11. Technology Types: Bulk Movement Point B Point A ETL Text Files RDBMS RDBMS Text Files ELT XML XML • Move a sizeable set of rows from point A to point B • Often • Part of a scheduled process • Transform the shape of the data being moved • Combine many sources or split into many destinations • Two flavors • ETL (Extract Transform Load) • SSIS • AscentialDatastage (IBM) • ELT (Extract Load Transform) • Oracle Warehouse Builder • Bulk Insert

  12. Technology Types: Message Oriented Movement C B A Coordinator Line Of Business Application RDBMS D Event XML Text Files • Central ‘Coordinator’ • Guarantees receipt and delivery of messages. • Components are ‘at rest’ until activated by the coordinator or an external event. • Data delivered in packets along with the message. • Terms that might fit in this category: • CDC • Trickle Feed • SOA • Message Bus

  13. Technology Types: Replication and Synchronization Repl / Sync Agent • Maintaining equivalent copies of data in different locations • One master, many slaves • Multi-master • High Availability (live backups) • Similarity between systems • Most often table copies on the same brand of RDBMS • Heterogeneous possible • Attunity, Goldengate, etc. • Transformations: Little to none • Terms that might fit in this category: • CDC, Log mining • Merge Replication • Checksum tables

  14. Technology Types: Federated Views Reports View Provider • Answers queries directly from many source systems • View Provider may: • Optimize and execute the combined query (Joins, etc.) • Pushes query parts down to the source. • Provide unified security model • Provide unified metadata • Cache source data • Support Heterogeneous Sources

  15. Technology Types: Stream Processing Destination Source CEP Engine Event Processing Event Log Event • Monitor a stream of data, Create an event when • Temporal (time based) events occur • Running average or aggregate hits a limit • Interesting sequence of records is detected • Also called CEP (Complex Event Processing) • Different from the other Technology Types??? I Can’t tell yet.

  16. Technology Types: Data Management and Quality • A collection of services common to most Data Integration solutions • Shared semantic model • Metadata library • Manage hierarchies • Data artifact level security model • Data Quality • Profile to understand • Merge to resolve duplicates • Find approximate matches • Test and monitor quality. • Version management for data.

  17. Organizing the Technology Choice Ideas Bulk Movement • Message Oriented Movement Replication and Synchronization • Federated Views • Data Management and Quality • Stream Processing (CEP)

  18. Problems and Technologies • Data Warehouse and Business Intelligence • Data Consistency Between Applications • Data System Migration and Consolidation • Master Data Management • Inter Enterprise Data Acquisition and Sharing Bulk Movement 15% • Message Oriented Movement 10% Replication and Synchronization 60% • Federated Views • Data Management and Quality 15% • Stream Processing (CEP)

  19. Microsoft’s Offerings • Data Warehouse and Business Intelligence • Data Consistency Between Applications • Data System Migration and Consolidation • Master Data Management • Inter Enterprise Data Acquisition and Sharing Bulk Movement SSIS Service Broker • Message Oriented Movement BizTalk SQL Replication Replication and Synchronization Distributed Query • Federated Views Master Data Services • Data Management and Quality • Stream Processing (CEP) Stream Insights

  20. What Should You Use and When? It Depends On The Artifacts / Facets / Attributes / “the flavor”:

  21. A Brief Tour of the Microsoft Offerings • Integration Services • Service Broker • Replication • Distributed Query • BizTalk Server • StreamInsights

  22. Integration Services (SSIS) - Overview

  23. Integration Service – Problems / Technologies / Artifacts My ODS or OLTP System Text Files • Constructing a Data Warehouse • Migration / Consolidation • Bulk Movement • ETL My DW SSIS

  24. Integration Services – Customer Solution Data Mart (Reporting and Analytics) Inventory Management (Oracle) Attunity CDC for Oracle SSIS Package Lookups, load facts and dimensions, surrogate key generation, … SSIS Package Data Warehouse (SQL Server) SQL Server Source SSIS Package Lookups, slowly changing dimensions, address cleansing, … CRM (SQL Server) Flat File Source SSIS Package Data conversions, parsing, data quality, aggregations, … SSIS Package Manufacturing Data (Flat files) Staging DB Operational Database (Shop Floor Application)

  25. Service Broker - Overview

  26. Service Broker – Problems / Technologies / Artifacts Database 1 Database 2 • Consistency Between Applications • Master Data Management (?) • Message Oriented Movement conversation Service A Service B Queue A Queue B Database 1 Database 2

  27. Service Broker – Customer Solution • Bank, lost loan provisioning • Requires very fast processing and analysis of up to date data. SSIS Service Broker subset 1 (x rows) SSIS server 1 subset 2 (x rows) sproc server 2 … … Result Table Source Table 32 cores subset n (x rows) server n

  28. SQL Replication - Overview

  29. SQL Replication – Problems / Technologies / Artifacts • Data Warehousing • Data consistency between Applications • Migration / Consolidation • Replication and Synchronization Reporting and Staging Enterprisee Information Management

  30. SQL Replication – Customer Solution Corporate Offices Edcon • 1000 branch offices • One way replication of catalog data from hub to spoke • Catalog downloads are partitioned with complex Dynamic and Join filtering – Catalog/Pricing data per store • Uses merge Replication for downloads • Subscribers located in each store - multi-user server databases • Uses Service Broker for uploads of transacted data – requires guaranteed in order delivery LOB Systems SSIS (daily) ‘Central’ Service Broker Merge Replication Branch Office ‘Branch’ Transactional Replication Online Terminal

  31. Distributed Query - Overview

  32. Distributed Query – Problems / Technologies / Artifacts Access SQL • Business Intelligence • Master Data Management • Federated Views SQL

  33. BizTalk Server - Overview

  34. BizTalk Server – Problems / Technologies / Artifacts LOB App Bus OLTP • B2B • Data Consistency Between Applications • Message Oriented Movement XML Docs Orchestration Logic

  35. BizTalk Server – Customer Solution Emerging Practice: Loose Coupled Batch • BizTalk Coordinates process • Some traditional data flow through messages • Many SSIS Packages with complicated relationships and dependencies. • Messages control activation of SSIS pieces. • Messages deliver intermediate results or pointers to batch data. • Scale out SSIS execution Bus LOB App Orchestration Logic XML Docs OLTP SSIS Package SSIS Package DW

  36. StreamInsights - Overview

  37. StreamInsights – Problems / Technologies / Artifacts • Business Intelligence • Data Warehousing • Message Oriented Movement • Stream Processing Data Sources, Operations, Assets, Feeds, Sensors, Devices Input Data Streams Results Operational Data Store & Archive CEP Engine f(x) g(y) f'(x) h(x,y)

  38. StreamInsights– Customer Solution Telco • Detect Fraud callbacks • During regular processing of data warehouse facts • Custom SSIS Component encapsulates the CEP engine. • Events detected and sent for follow-up Switch Logs Switch Logs Switch Logs SSIS Fact Processing StreamInsight Component Fraud DW

  39. A Bigger Picture?

  40. Required Slide Resources Learning • Sessions On-Demand & Community • Microsoft Certification & Training Resources www.microsoft.com/teched www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet • http://microsoft.com/msdn

  41. Required Slide Complete an evaluation on CommNet and enter to win!

  42. Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st http://northamerica.msteched.com/registration You can also register at the North America 2011 kiosk located at registrationJoin us in Atlanta next year

  43. © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related