1 / 32

Massive scale with Microsoft SQL Server 2008 R2 Parallel Data Warehouse Edition

Massive scale with Microsoft SQL Server 2008 R2 Parallel Data Warehouse Edition. Rushabh Mehta Managing Director (India) | Solid Quality Mentors rmehta@solidq.com. About me: Rushabh Mehta. Professional Association for SQL Server President Solid Quality Mentors (Solid Q )

silvain
Télécharger la présentation

Massive scale with Microsoft SQL Server 2008 R2 Parallel Data Warehouse Edition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Massive scale with Microsoft SQL Server 2008 R2 Parallel Data Warehouse Edition Rushabh Mehta Managing Director (India) | Solid Quality Mentors rmehta@solidq.com

  2. About me: Rushabh Mehta • Professional Association for SQL Server • President • Solid Quality Mentors (SolidQ) • Business Intelligence Mentor • Managing Director, India • SQL Server MVP • rmehta@solidq.com ◊ www.solidq.com ◊ @rushabhmehta

  3. Agenda • Microsoft Data Warehousing Overview • SMP v/s MPP Architecture • Microsoft Parallel Data Warehouse Architecture and Components

  4. Microsoft Data Warehousing Offerings

  5. Microsoft’s Commitment to DW and BI FastTrack2009 PDW2010 SQL 6.01995 SQL 6.51996 SQL 7.01998 SQL 20002000 SQL CE2000 64-bit2001 SQL 20052005 SQL 20082008 OLAP and ETL DW Scale Parallel Processing Partitioning Scale to 100s of TB Data Mining Data Profiling Power Pivot Pervasive Insight Managed Reporting Ad-hoc Reporting Compression Load Optimize Data Warehousing VS Integration Resource Governor Multiple sources KPIs • Gartner Leaders Quadrant for Business Intelligence, since 2008 • Gartner Leaders Quadrant for Data Warehouse, since 2008 • Leader in “The Forrester Wave: Enterprise Data Warehousing Platforms, Q1 2009” • Fastest growing of top 5 data warehouse vendors - IDC • Microsoft spends as a company $9.1 billion in research annually

  6. SQL Server Fast Track Data Warehouse Solution to help customers and partners accelerate their data warehouse deployments • A methodfor designing a cost-effective, balanced system for Data Warehouse workloads • Reference hardware configurationsdeveloped in conjunction with hardware partners using this method • Best practices for data layout, loading and management

  7. Fast Track Scope Supporting Systems BI Data Storage Systems Presentation Layer Systems Integration Services ETL Analysis Services Cubes Web Analytic Tools Data Path Reporting Services Presentation Data Presentation Data Subject AreaData Marts SharePoint Services Microsoft Office SharePoint SAN, Storage Array Data Warehouse PerformancePoint Services Data Staging,Bulk Loading Reference Architecture Scope (dashed)

  8. Fast Track Value Proposition Appliance-like time to value Reduces DBA effort; fewer indexes, much higher level of sequential I/O Choice of HW Platforms Dell, HP, Bull, EMC and IBM – more in future Low TCO Through Commodity hardware and value pricing; Lower storage costs. High Scale New reference architectures scale up to 48 TB (assuming 2.5x compression) Reduced Risk Validated by Microsoft; better choice of hardware; application of Best Practice 8

  9. SMP Architecture • SMP = Symmetric Multiprocessing • Two or more identical processors connected to single shared main memory and controlled by single OS instance • Any processor can work on any task • Easily move tasks between processors to balance workload efficiently • All SQL Server implementations up until now have been SMP

  10. MPP Architecture • MPP = Massively Parallel Processing • Uses many separate CPUs running in parallel to execute a single program • Each CPU has its own memory • Applications must be segmented, using high speed communications between nodes

  11. Parallel Data Warehouse Control Rack DataRack Control Rack Data Rack/s

  12. Parallel Data Warehouse Compute Nodes Storage Nodes SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Control Nodes Active / Passive Client Drivers Management Servers Support / Patching Compute Node + Storage Node PDW Node Dual Fiber Channel Dual Infiniband Landing Zone ETL Load Interface Backup Node Corporate Backup Solution Spare Compute Node Corporate Network Private Network

  13. Compute Nodes Compute Node SQL

  14. Architecture: Compute Server Node Hardware Options Dual Multi-Core Processors TempDB Workspace Enterprise Class DBMS CPU RAM DUAL 4Gb FC CPU Dual InfiniBand

  15. Data Layout • Replicated:A table structure that exists as a full copy within each discrete PDW Node. • Distributed: A table structure that is hashed on a single column and uniformly distributed across all nodes on the appliance. Each distribution is a separate physical table in the DBMS. • Ultra shared nothing: The ability to design a schema of both distributed and replicated tables to minimize data movement between nodes • Small sets of data can be more efficiently stored in full (replicated). • Certain set operations are more efficient against full sets of data (i.e., single node operations).

  16. Date Dim Item Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc Store Dim Store Dim ID Store Name Store Mgr Store Size Data Layout DD DD DD DD DD DD ID ID ID ID ID ID SF 1 SF 1 SF 2 SF 4 SF 5 SF 3 SD SD SD SD SD SD PD MD PD PD PD PD Sales Fact Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Promo Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End

  17. Parallel Data Warehouse Compute Nodes Storage Nodes SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Control Nodes Active / Passive Control Nodes Active / Passive Cluster Client Drivers Client Drivers Management Servers Support / Patching Dual Fiber Channel Dual Infiniband Landing Zone ETL Load Interface Backup Node Corporate Backup Solution Spare Compute Node Corporate Network Private Network

  18. Control Node & Client Drivers • Client connections always go through the control node • The Control Node contains no persistent user data • PDW ‘Secret Sauce’ • Processes SQL requests • Prepares execution plan • Orchestrates distributed execution • Local SQL Server to do final query plan processing / result aggregation • Client Drivers provided by DataDirect • ODBC, OLE-DB, JDBC and ADO.NET client drivers • Available drivers for 32 and 64 bits

  19. PDW Benefits – Massive Parallel Processing Control Rack DataRack Query 1 is standard T-SQL submitted to SQL Server on Control Node Query 1 Query is executed on all 10 Nodes Results are sent back to client ? ? ? ? ? ? ? ? ? ?

  20. PDW Benefits – Massive Parallel Processing Control Rack DataRack Multiple queries are simultaneously executed across all nodes. PDW supports querying while data is loading. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Blazing fast performance by parallelizing queries on highly optimized ultra shared nothing nodes.

  21. Parallel Data Warehouse Compute Nodes Storage Nodes SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Control Nodes Active / Passive Client Drivers Management Servers Management Nodes Active / Passive Cluster Support / Patching Support / Patching Dual Fiber Channel Dual Infiniband Landing Zone ETL Load Interface Backup Node Corporate Backup Solution Spare Compute Node Corporate Network Private Network

  22. Management Node • Runs a separate domain controller (Active Directory) • Used for deploying patches to all nodes in the appliance • Holds images in case a node needs reimaging • High Availability using Active / Passive clustering

  23. Parallel Data Warehouse Compute Nodes Storage Nodes SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Control Nodes Active / Passive Client Drivers Management Servers Support / Patching Dual Fiber Channel Dual Infiniband Landing Zone Landing Zone ETL Load Interface ETL Load Interface Backup Node Corporate Backup Solution Spare Compute Node Corporate Network Private Network

  24. Landing Zone • Provides high capacity storage for data files from ETL processes • Integration services available on the landing zone • Connected to internal network • Available as sandbox for other applications and scripts that run on internal network • Source • Landing Zone Files • Data Loader • Compute Nodes

  25. Parallel Data Warehouse Compute Nodes Storage Nodes SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Control Nodes Active / Passive Client Drivers Management Servers Support / Patching Dual Fiber Channel Dual Infiniband Landing Zone Backup Node ETL Load Interface Backup Node Corporate Backup Solution Corporate Backup Solution Spare Compute Node Corporate Network Private Network

  26. Backup Node • Coordinated backup across the nodes • Database level backup • Full or differential • Metadata backup • Can restore to a larger appliance • Optional item – 1 size per config • Up to 524TB of capacity • Available in XS, S, M, L and XL

  27. PDW Software Architecture MS BI (AS, RS) Nexus Query Tool 3rd Party Tools (Client Access) Compute Node Compute Nodes Compute Nodes DMS JDBC OLE-DB ODBC ADO.NET IIS SQL Server Control Node User Data Admin Console DMS PDW Services Landing Zone DMS Loader Client SQL SSIS DSQL Core Engine Services DMS Manager SQL OS Backup Node SQL OS DMS SQL Server DW Schema DW Queue DW Configuration DW Authentication Management Node HPC AD Built by DWPU Existing MS software 3rd Party

  28. Conclusion • MPP architecture supports massive scale through increased parallelization and shared-nothing architecture • Microsoft SQL Server 2008 R2 Parallel Data Warehouse Edition brings massive scale wrapped in the simplicity of an appliance

  29. References • Microsoft Parallel Data Warehouse official site http://www.microsoft.com/pdw

  30. Feedback / QnA • Your Feedback is Important! Please take a few moments to fill out our online feedback form at: << Feedback URL – Ask your organizer for this in advance>> For detailed feedback, use the form at http://www.connectwithlife.co.in/vtd/helpdesk.aspx Or email us at vtd@microsoft.com • Use the Question Manager on LiveMeeting to ask your questions now!

  31. Contact • SolidQ www.solidq.com • Email Address rmehta@solidq.com

More Related