1 / 42

Real-time Analytics with Informix Warehouse Accelerator

Real-time Analytics with Informix Warehouse Accelerator . Fred Ho – Informix Development. Agenda. Data Warehouse Industry Trends Informix Data Warehousing Informix Warehouse Accelerator 3 rd Generation Data Base Technology Overview of the Informix Warehouse Accelerator (IWA)

paco
Télécharger la présentation

Real-time Analytics with Informix Warehouse Accelerator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-time Analytics with Informix Warehouse Accelerator • Fred Ho – Informix Development

  2. Agenda • Data Warehouse Industry Trends • Informix Data Warehousing • Informix Warehouse Accelerator • 3rd Generation Data Base Technology • Overview of the Informix Warehouse Accelerator (IWA) • Target Market • PoC Customer Experience • IWA Technologies • Loading IWA • Referenced Hardware & Software Configuration • Q&A

  3. Commercial Data Warehousing Technologies Over the Years • Britton Lee as 1st data base appliance • Red Brick as 1st DBMS for warehousing with Star Schema • Massively parallel systems (shared-nothing architectures), e.g. • Informix XPS, Teradata

  4. Data Warehousing Workload & Optimizations • Data Warehousing/OLAP workload are inherently more complex than OLTP transactions and reasons are well-documented. • Ways to overcome that include: • Building Indexes • Partitioning of data • Building cubes, MOLAP/ROLAP/HOLAP • Query tuning • Appliances that add a new layer of hardware to perform I/O for DBMS • Mixed-Workload always a challenge

  5. Data Warehousing Industry Trends

  6. State of Data Warehousing in 2011 DBMS Market in 2011: • DBMS market at the close of 2009 was approximately $21.2 billion (2010 data not yet available) • Data Warehouse DBMS market was approximately 35% of the DBMS market or $7.42 billion Key Findings: • Data warehouse DBMSs have evolved to a broader analytics infrastructure supporting operational analytics, corporate performance management and other new applications and uses. • Cost is driving interest in alternative architectures but performance optimization is driving multi-tiered data architectures and a variety of deployment options - notably a strong interest in in-memory data mart deployments.

  7. State of Data Warehousing, Cont’d Market Dynamics for 2011 • Today, smaller data warehouses, those less than 5 TB's of source system extracted data (SSED) are the only "data warehouse" for the entire organization and are commonly solving organizations' analytic needs. Gartner estimates that between 70% and 75% of all systems referred to as EDW are actually single business departments in nature. Analysis: • Optimization techniques such as summaries, aggregates and indexes are simply the result of performance restrictions inherent to normalized data and the way the RDBMS manages rows and columns.

  8. State of Data Warehouse, Cont’d A Glimpse Into the Future • Vendor solutions began to focus even more on the ability to isolate and prioritize workload types including strategies for dual warehouse deployments and mixing OLTP and OLAP on the same platform. • In-memory DBMS solutions provide a technology which enables OLTP/OLAP combined solutions. Organizations should increase their emphasis on financial viability during 2011 and even into 2012 as well as aligning their analytics strategies with vendor road maps when choosing a solution.

  9. Data Warehouse Trends for the CIO, 2011-2012 Data Warehouse Appliances: • DW appliances are not a new concept. Most vendors have developed an appliance offering or promote certified configurations. Main reason for consideration is simplicity. The Resurgence of Data Marts: • Data marts can be used to optimize DW by offloading part of the workload, returning greater performance to the warehousing environment Column-Store DBMSs • CIOs should be aware that their current DBMS vendor may offer a column-store solution. Don’t just buy a column-store-only DBMS because a column store was recommended by your team. In-Memory DBMSs • IMDBMS technology also introduces a higher probability that analytics and transactional systems can share the same database.

  10. Informix Warehouse Accelerator (IWA)

  11. Agenda • 3rd Generation Data Base Technology • Overview of the Informix Warehouse Accelerator (IWA) • Target Market • PoC Customer Experience • IWA Technologies • Loading IWA • Referenced Hardware & Software Configuration

  12. Third Generation of Database Technology According to IDC’s Article (Carl Olofson) – Feb. 2010 1st Generation: - Vendor proprietary databases of IMS, IDMS, Datacom 2nd Generation: - RDBMS for Open Systems, dependent on disk layout, limitations in scalability and disk I/O - Database tuning by adding updating stats, creating/dropping indexes, data partitioning, summary tables & cubes, force query plans, resource governing 3rd Generation: IDC Predicts that within 5 years: • Most data warehouses will be stored in a columnar fashion • Most OLTP database will either be augmented by an in-memory database (IMDB) or reside entirely in memory • Most large-scale database servers will achieve horizontal scalability through clustering

  13. Agenda • 3rd Generation Data Base Technology • Overview of the Informix Warehouse Accelerator (IWA) • Target Market • PoC Customer Experience • IWA Technologies • Setting Up IWA • Referenced Hardware & Software Configuration

  14. IWA requires you to do… No Query Tuning and Optmizer Hints No Database tuning No Index creation, reorganization No Update Statistics No Partioning/Fragementation No Storage Management/Page size configuration No Database/schema Changes No Application Changes No Summary Tables/Materialized Views No Buying more expensive Hardware No Change of expectations Power of Simplicity!

  15. Informix Warehouse Accelerator3rd Generation Database Technology is Here – Performance is Everything How is it different? • Performance: Unprecedented response times to enable 'train of thought' analysis frequently blocked by poor query performance. • Integration: Connects to IDS through deep integration providing transparency to all applications. • Self-managed workloads: queries are executed in the most efficient way • Transparency: applications connected to IDS, are entirely unaware of IWA • Simplified administration: appliance-like hands-free operations, eliminating many database tuning tasks What is it? The Informix Warehouse Accelerator (IWA) is a workload optimized, appliance-like, add-on, that enables the integration of business insights into operational processes to drive winning strategies. It accelerates select queries, with unprecedented response times. Breakthrough Technology Enabling New Opportunities

  16. 7 7 1 1 6 6 2 2 5 5 3 3 4 4 Breakthrough technologies for performance Extreme Compression Required because RAM is the limiting factor. Row & Columnar Database Row format within IDS for transactional workloads and columnar data access via accelerator for OLAP queries. Multi-core and Vector Optimized Algorithms Avoiding locking or synchronization In Memory Database 3rd generation database technology avoids I/O. Compression allows huge databases to be completely memory resident Predicate evaluation on compressed data Often scans w/o decompression during evaluation Frequency Partitioning Enabler for the effective parallel access of the compressed data for scanning. Horizontal and Vertical Partition Elimination. Massive Parallelism All cores are used within cpus for queries

  17. Informix Warehouse Accelerator Configuration SQL Queries (from apps) TCP/IP Informix Warehouse Accelerator IDS SQL (via DRDA) Query Processor Query Router Results Compressed DB partition Data Warehouse Bulk Loader Informix Warehouse Accelerator: • Connects to IDS via TCP/IP & DRDA • Analyzes, compresses, and loads • Copy of (portion of) warehouse • Processes routed SQL query and returns answer to IDS IDS: • Routes SQL queries to accelerator • User need not change SQL or apps. • Can always run query in IDS, e.g., if • too short an est. execution time

  18. Characterized by: “Star” or “snowflake” schema: Region Brand City Store Product SALES Month Category Period Quarter Target Market: Business Intelligence (BI) Dimensions Fact Table Complex, ad hoc queries that typically • Look for trends, exceptions to make actionable business decisions • Touch large subset of the database (unlike OLTP) • Involve aggregation functions (e.g., COUNT, SUM, AVG,…) • The “Sweet Spot” for the IWA!

  19. Case Study #1: Major U.S. Shoe Retailer • Top 7 time-consuming queries in Retail BI and Warehouse: (Against 1 Billion rows Fact Table) Our Retail users will be really happy to see such a huge improvement in the queries processing timings. This IWA extension to IDS will really bring value to the Retail BI environment.

  20. Case Study #2: Datamart at a Government Agency • Microstrategy report was run, which generates • 667 SQL statements of which 537 were Select statements • Datamart for this report has 250 Tables and 30 GB Data size • Original report on XPS and Sun Sparc M9000 took 90 mins • With IDS 11.7 on Linux Intel box, it took 40 mins • With IWA, it took 67 seconds.

  21. Case Study #3: Public Sector Customer Dim 2.5 million Program Dim Payment Fact 630 million records Organization Dim Commodity Dim

  22. IWA Test Results

  23. Test Notes Fact table compressed from 110GB to 24GB in IWA Best performance gains on table and partition scans (300 times faster) Not much benefit for selects for < 9 records that can be retrieved from an Informix Index

  24. Agenda • 3rd Generation Data Base Technology • Overview of the Informix Warehouse Accelerator (IWA) • Target Market • Beta Customer Experience • IWA Technologies • Setting Up IWA • Referenced Hardware & Software Configuration

  25. Row Oriented Data StoreEach row stored sequentially • Optimized for record I/O • Fetch and decompress entire row, every time • Result – • Very efficient for transactional workloads • Not always efficient for analytical workloads If only few columns are required the complete row is still fetched and uncompressed

  26. Columnar Data Store Data is stored sequentially by column • Data is compressed sequentially for column: • Aids sequential scan • Slows random access If attributes are not required for a specific query execution, they are skipped completely.

  27. Number of Occurrences Rare values CommonValues Compression: Frequency Partitioning Trade Info (volume, product, origin country) Column Partitions Histogramon Origin Vol Prod Origin ChinaUSA GER,FRA,… Rest Origin Top 64 traded goods – 6 bit code Cell 1 Cell 3 Cell 4 Product Rest Cell 2 Cell 5 Cell 6 Histogram on Product Table partitioned into Cells • Field lengths vary between cells • Higher Frequencies  Shorter Codes (Approximate Huffman) • Field lengths fixed within cells

  28. Male John 08/10/06 Mango Dict Dict Dict HuffmanEncode HuffmanEncode HuffmanEncode Compression Process: Step 1 Input tuple Male, John, 08/10/06, Mango Column 1 Column 2 Column 3 Co-codetransform Type specifictransform Sat 2006 w35 Column1 & 2 Column3.A Column3.B Male/John/Sat Male/John w35/Mango p = 1/512 p = 1/8 p = 1/512 ColumnCode ColumnCode ColumnCode 101101011 001 01011101 10110101100101011101 TupleCode

  29. — — Compression Process: Step 2 Tuplecode 10110101110001011101 First tuple code 10110101110001011111 SortedTuplecodes 1 101101011100001100 1011010111000011101 PreviousTuplecode — 0000000000000000001 Delta 00000000000000000001 0000000000000000101 Dict HuffmanEncode 000 Delta Code 010 1110 Append CompressionBlock 10110101110001011101 101101011100010111010000101110 000 010 1110 Look Ma, no delimiters!

  30. A1 D1 G1 B1 E1 F1 C2 C1 H1 H2 C4 H4 A4 D4 G4 B3 B4 E4 E3 F4 F3 Result1 Operand Result2 Operand Operand Result3 Result4 Operand A3 A2 D2 D3 G3 G2 B2 E2 F2 C3 H3 Cell Block Bank β1 (32 bits) Bank β2 (32 bits) Bank β3 (16 bits) Register Stores Facilitate SIMD Parallelism • Access only the banks referenced in the query (like a column store): • SELECT SUM (T.G) • FROM T • WHERE T.A > 5 • GROUP BY T.D • Pack multiple rows from the same bank into the 128-bit register • Enablesyet another layer of parallelism: SIMD (Single-Instruction, Multiple-Data)! 32 bits 32 bits 32 bits 32 bits Vector Operation 128 bits

  31. 11111 0 1111 0 Simultaneous Evaluation of Equality Predicates State==‘CA’ && Quarter == ‘Q4’ • CPU operates on 128-bit units • Lots of fields fit in 128 bits • These fields are at fixed offsets • Apply predicates to all columns simultaneously! Translate value queryto Code query State==01001 && Quarter==1110 State Quarter Row … … … … & Mask == Selectionresult 01001 0 1110 0

  32. Agenda • 3rd Generation Data Base Technology • Overview of the Informix Warehouse Accelerator (IWA) • Target Market • Beta Customer Experience • IWA Technologies • Setting Up IWA • Referenced Hardware & Software Configuration

  33. Defining, What Data to Accelerate • A MART is a logical collection of tables which are related to each other. For example, all tables of a single star schema would belong to the same MART. • The administrator uses a rich client interface to define the tables which belong to a MART together with the information about their relationships. • IDS creates definitions for these MARTs in the own catalog. The related data is read from the IDS tables and transferred to IWA. • The IWA transforms the data into a highly compressed, scan optimized format which is kept locally (in memory) on the Accelerator IDS + IWA Coordinator Process Worker Processes Define

  34. Smart Analytics Optimizer Studio DB connections Accelerator

  35. Agenda • 3rd Generation Data Base Technology • Overview of the Informix Warehouse Accelerator (IWA) • Target Market • Beta Customer Experience • IWA Technologies • Loading on IWA • Referenced Hardware & Software Configuration

  36. IWA Referenced Hardware Configuration Options:

  37. Configuration Scenarios • Alternative 1: Install IWA on a separate Linux box • Alternative 2: Install Informix and IWA in the same symmetric multiprocessing system • Note: IWA requires Linux on Intel x64 (64-bit EM64T) Xenon Informix Warehouse Accelerator Informix Database Server 64-bit RHEL 5/SUSE 11 Solaris 10/AIX 6.1/HP-UX 11.31 64-bit RHEL 5/SUSE 11 Informix Database Server Informix Warehouse Accelerator 64-bit RHEL 5/SUSE 11

  38. IWA Software Processes BI Client OLTP Client 1 coordinator node per 4 worker nodes Informix Warehouse Accelerator coordinator node accelerated queries Informix Database Server TCP/IP & DRDA Manages the distribution tasks such as loading data and and query processing. Performs data compression and query processing with extreme parallelism Query Router worker node worker node worker node worker node highly compressed data in memory: 25% of fact table + copy of all dimension tables highly compressed data in memory: 25% of fact table + copy of all dimension tables highly compressed data in memory: 25% of fact table + copy of all dimension tables highly compressed data in memory: 25% of fact table + copy of all dimension tables DWADIR=$IWA_INSTALL_DIR/dwa/demo START_PORT=21020 NUM_NODES=5 (1 coord, 4 worker) WORKER_SHM=0.7 (70% of mem) COORDINATOR_SHM=0.05 (5% of mem) DRDA_INTERFACE="eth0" accelerator storage directory: compressed data on disk

  39. Summary: IWA Meeting the Customer Needs & Data Warehouse Trends Data Warehouse Appliances: • DW appliances are not a new concept. Most vendors have developed an appliance offering or promote certified configurations. Main reason for consideration is simplicity. The Resurgence of Data Marts: • Data marts can be used to optimize DW by offloading part of the workload, returning greater performance to the warehousing environment Column-Store DBMSs • CIOs should be aware that their current DBMS vendor may offer a column-store solution. Don’t just buy a column-store-only DBMS because a column store was recommended by your team. In-Memory DBMSs • IMDBMS technology also introduces a higher probability that analytics and transactional systems can share the same database.

  40. IWA Futures • Ease of Use • Tool for Data Mart definition and loading • OAT support of OnDWA utilities • Globalization • Additional locales support • Removing SQL Restrictions • Support all Join types • Inter-AQT joins • Subqueries • Dimension-table only queries • Scalability • Blade Server enablement • SDS Off-loading to IWA • Utilizing all cores within CPUs • Data Loading Options • Trickle Feeding • Partial Fact Table

  41. (Fred Ho – hof@us.ibm.com)

More Related