New Features - Informix 11.70.xC2 Technical Overview

New Features - Informix 11.70.xC2Technical Overview Scott Pickett – WW Informix Technical Sales For questions about this presentation contact: spickett@us.ibm.com

Agenda • Introduction – Informix 11.70.xC2 • New features in the database: • Informix Ultimate Warehouse Edition (IUWE) • Case-Insensitive Databases • Improved Basic Text Search (BTS) queries • Table and column aliases in DML statements • Private Instance/Server (non-root install) • RPM image deployment for Linux • Administration enhancements • Enhancements to the OpenAdmin Tool (OAT) • Network performance and scalability enhancements • IBM Informix Genero (new offering)

Informix 11.70.xC2 • Informix database server 11.70.xC2. • Released (eGA) in March 29th, 2011. • Comes with new offerings: • Informix Ultimate Warehouse Edition • A new Informix edition with 3rd database technology for warehousing. • IBM Informix Genero • A new IBM software offering for modernizing 4GL RDS applications. • Informix products released along with the database server: • Informix database server 11.70.xC2. • Client Software Development Kit (SDK) / Connect 3.70.xC2. • OpenAdmin Tool (OAT) 2.72. • Informix Virtual Appliance 11.70.xC2.

Trends: Third Generation Database Technology Carl Olofson– Feb. 2010 1st Generation 2nd Generation 3rd Generation RDBMS for Open Systems, dependent on disk layout, limitations in scalability & disk I/O Vendor proprietary databases (IMS, IDMS, Datacom) Most data warehouses will be stored in a columnar fashion Most OLTP databases will either be augmented by an in-memory database (IMDB) or reside entirely in memory Most large-scale database servers will achieve horizontal scalability through clustering

Trends: State of Data Warehousing in 2011 Jan 2011 Key Findings Analysis A Glimpse into the Future “Optimization techniques such as summaries, aggregates and indexes are simply the result of performance restrictions inherent to normalized data and the way the RDBMS manages rows and columns.” “In-memory DBMS solutions provide a technology which enables OLTP/OLAP combined solutions.” “Cost is driving interest in alternative architectures butperformance optimization is driving multi-tiered data architectures and a variety of deployment options - notably a strong interest in in-memory data mart deployments.”

Trends: Key Drivers of Technology Changes for Warehouses The Data Warehouse Institute - 2010

Informix Warehousing Moving Forward • Goal is to provide a comprehensive warehousing platform that is highly competitive in the marketplace • Incorporating the best features of XPS and Red Brick into Informix for OLTP/Warehousing and Mixed-Workload. • Using the latest Informix technology in: • Continuous Availability and Flexible Grid. • Data Warehouse Accelerator using latest industry technology. • Integration of IBM’s BI software stack.

Informix Warehouse Roadmap Warehouse Accelerator Star Join Optimization Multi-index Scan New Fragmentation Fragment Level Stats Storage Provisioning External Tables • Cognos integration • - Native Content Store on IDS • SQL Merge Informix Warehouse with Storage Optimization/Compression • Informix Warehouse Feature • SQW • Data Modeling • ELT/ETL

Light Scans Merge Hierarchical Queries Multi-Index Scan Skip Scan Bitmap Technology Star and Snowflake join optimization Implicit PDQ Access performance Deep Compression HPL DB utilities ON utilities DataStage External Tables Online attach/detach Interval and List Fragmentation Online attach/detach Fragment level stats Storage provisioning Table defragmenter Data & Storage Management Query Processing Data Loading 11.70 Warehousing – Engine Features Query Tools LOB apps BI Apps Databases BPS Apps Other transactional data sources Analytics Query processing I/O & data loading DBMS & Storage mgmt Source: Forrester

Now Introducing… Informix Ultimate Warehouse Edition What it is What it does • New Informix edition for warehouse • Packaged as: • Informix Ultimate Edition • Compression Feature (Informix) • Informix Warehouse Accelerator (IWA)* • IBM Smart Analytics Optimizer Studio • Compresses and processes data mart entirely in memory • Uses a deep columnar approach for optimizing perfornance • Runs as an add-on to Informix Benefits • Order of magnitude or more performance gains • Transparent to BI apps – queries continue to run against Informix • No maintenance • Manage warehouses on Informix *Informix Warehouse Accelerator requires a Linux Intel system as it is relies on optimizations in that environment

Informix Warehouse Accelerator3rd Generation Database Technology is Here How is it different? • Performance: Unprecedented response times to enable 'train of thought' analysis frequently blocked by poor query performance. • Integration: Connects to IDS through deep integration providing transparency to all applications. • Self-managed workloads: queries are executed in the most efficient way • Transparency: applications connected to IDS, are entirely unaware of IWA • Simplified administration: appliance-like hands-free operations, eliminating many database tuning tasks What is it? The Informix Warehouse Accelerator (IWA) is a workload optimized, appliance-like, add-on, that enables the integration of business insights into operational processes to drive winning strategies. It accelerates select queries, with unprecedented response times. It is acquired through Informix Ultimate Warehouse Edition bundle Breakthrough Technology Enabling New Opportunities

7 7 1 1 6 6 2 2 5 5 3 3 4 4 Informix Warehouse Accelerator – Breakthrough Technology for OLAP Performance Extreme Compression 3 to 1 compression ratio Row & Columnar Database Row format within IDS for transactional workloads and columnar data access via accelerator for OLAP queries. Multi-core and Vector Optimized Algorithms Avoiding locking or synchronization In Memory Database 3rd generation database technology avoids I/O. Compression allows huge databases to be completely memory resident Predicate evaluation on compressed data Often scans w/o decompression during evaluation Frequency Partitioning Enabler for the effective parallel access of the compressed data for scanning. Horizontal and Vertical Partition Elimination. Massive Parallelism All cores are used for each query Comes with Smart Analytics Studio, a GUI tool, for configuring data mart and monitoring IWA

Characterized by: “Star” or “snowflake” schema: Region Brand City Store Product SALES Month Category Period Quarter Target Market: Business Intelligence (BI) Dimensions Fact Table Complex, ad hoc queries that typically • Look for trends, exceptions to make actionable business decisions • Touch large subset of the database (unlike OLTP) • Involve aggregation functions (e.g., COUNT, SUM, AVG,…) • The “Sweet Spot” for the IWA!

What IWA is Designed For • Selective, fast scans over large (fact) tables. • Joins with smaller Dimension tables. • OLAP-style queries over large fact tables in relational star schema with grouping and aggregations. SELECT PRODUCT_DEPARTMENT, REGION, SUM(REVENUE) FROM FACT_SALES F INNER JOIN DIM_PRODUCT P ON F.FKP = P.PK INNER JOIN DIM_REGION R ON F.FKR = R.PK LEFT OUTER JOIN DIM_TIME T ON F.FKT = T.PK WHERE T.YEAR = 2009 AND R.GEOID = 17 AND P.TYPEID = 3 GROUP BY PRODUCT_DEPARTMENT, REGION

New orders of performance, minimal admin/tuning Near zero administration!! Storage allocation/data partitioning No maintenance!! 10s to 100s of times faster Index maintenance Statistics maintenance NO Predictable response times Cube maintenance or summary tables Application changes

What our customers are saying…. "Before using Informix Warehouse Accelerator, complex inventory and sales analysis queries on the enterprise warehouse with more than a billion rows took anywhere from a few minutes to 45 minutes to run. When we ran those same queries using Informix Warehouse Accelerator, they finished in 2 to 4 seconds! That means they ran from 60 to 1400 times as quickly, with an average acceleration factor of more than 450 — all without any index or cube building, query tuning or application changes!” – Large Shoe Retailer "Informix Warehouse Accelerator is a game changer in the database market! It offers highly impressive performance with queries running 30 times faster than previously. The columnar technology saves a lot of processing time; it reduced our workload time from 9.5 hours to 15 minutes, all without any database tuning or need to manage the physical storage.“ – Lester Knutsen, President, Advanced DataTools Corporation

Informix Ultimate Warehouse – Performance, Simplicity, Transparency BI App Configure, offload data mart (Using ISAO) Bulk Loader HPUX-64, AIX-64, SOL-64, Linux-64 Linux-64, Intel Redirect queries Query Processor Compressed DB partition Informix env Informix Warehouse Accelerator Query Results Warehouse DataMart

IBM Smart Analytics Studio (ISAO) Easy to use graphical tool for data mart configuration

Defining, What Data to Accelerate • A MART is a logical collection of tables related to each other • All or some tables (or several columns of interest) of a single star schema would belong to the same MART. • The DBA uses ISAO to define the tables/columns in the MART. • Informix creates definitions for these MARTs in the own catalog • These MART definitions are known as Accelerated Query Tables (AQT). • The related data is read from the IDS tables and loaded into IWA. • IWA transforms the data into a highly compressed, scan optimized format which is kept locally (in memory) on the Accelerator. ISAO IDS + IWA Worker Processes Coordinator Process Define

The Sweet Spot Informix Ultimate Warehouse Informix, XPS, Red Brick < 5 TB data mart Mixed Workloads High Concurrency Star schema Mixed Workloads

Customer Warehouse Scenarios and Performance Results Federal Agency - Germany Federal Agency - USA • 537 Queries executed over a 30 GB data mart • 432 Queries accelerated via IWA • Performance with IDS+IWA was 90 times faster than XPS • Sample data set from 2 TB warehouse • 8 representative queries ran 127 times faster Large Shoe Retailer Large Global Retailer • BI Queries on 150 GB data set • Fact table – 1 billion rows • Queries against Informix – 22 minutes • Queries against Informix + IWA : 4 secs, 330 times faster • 10 GB data mart sample • Fact table – 25 million rows • Queries against Informix – over 20 mins • Queries against Informix + IWA: 4 secs, 300 times faster

Case Study #1: Major U.S. Shoe Retailer • Top 7 time-consuming queries in Retail BI and Warehouse: (Against 1 Billion rows Fact Table) Our Retail users will be really happy to see such a huge improvement in the queries processing timings. This IWA extension to IDS will really bring value to the Retail BI environment.

Case Study #2: U.S. Government Agency

IWA Referenced Hardware Configuration Options:

IWA Software Components • Linux on Intel x86_64 (RHEL 5 or SUSE SLES 11). • Informix (IDS) 11.70 + IWA, including IDS Stored Procedures. • ISAO Studio Plug-in – GUI for Mart definition. • OnIWA – On Utilities for Monitoring IWA.

Summary: IWA Meeting the Customer Needs & Data Warehouse Trends Data Warehouse Appliances: • DW appliances are not a new concept. Most vendors have developed an appliance offering or promote certified configurations. Main reason for consideration is simplicity. The Resurgence of Data Marts: • Data marts can be used to optimize DW by offloading part of the workload, returning greater performance to the warehousing environment. Column-Store DBMSs • CIOs should be aware that their current DBMS vendor may offer a column-store solution. Don’t just buy a column-store-only DBMS because a column store was recommended by your team. In-Memory DBMSs • IMDBMS technology also introduces a higher probability that analytics and transactional systems can share the same database.

Case Insensitive Databases – Background (1) • Until now… A straightforward string-search query like this: select company, first_name, last_name, address from customer where address like ‘% 10th Ave%’ ; • Involving structured string data: • Column “address” of type…char, varchar, lvarchar, nchar, nvarchar • Which might be intended to be case-insensitive: • Example: Asking for customers located on the 10th Avenue. • Always produced case-sensitive results • Results will include: • 10th Ave, 10th Ave., 10th Avenue, 10th AveNUE, etc. • Results will NOT include: • 10TH AVE, 10TH AVENUE,10th Avenue, 10th ave, 10Th Ave., etc.

Case Insensitive Databases – Background (2) If we wanted a straightforward case-insensitive search… • Best alternatives were: • UPPER/LOWER function: • UPPER on keyword and character column + functional index on the column. • Basic Text Search (BTS) datablade/extension: • bts index on the character column + bts_contains function.

Case Insensitive Databases – Background (3) • Inconvenient with these approaches • Required application code changes: • Case-insensitive not transparent to developers and DBAs. • Problematic in database-independent applications or applications ported from other DBMS that have case-insensitive databases (like SQL Server). • Required additional index on the character column: • Functional index or BTS index. • Maintenance and performance overhead: • Especially if having multiple Informix databases or much application code doing case-insensitive search.

Case Insensitive Databases – What is new • Users can create databases with a Case Insensitive property. • Effect of this property: • Typical string searches and sorts onNCHAR/ NVARCHAR columns in that case-insensitive database, will produce case insensitive results. • CHAR/ VARCHAR/LVARCHAR in that same case-insensitive database, will continue to behave case sensitive. • How to determine whether a database is case-insensitive? • New is_case_insens field in sysmaster:sysdatabases catalog view: • 1 if the database is case-insensitive. • 0 if the database is case-sensitive (as traditionally, default property). • Example: select name, is_case_insens from sysmaster:sysdatabases; name is_case_insens csdb 0 <- case sensitive (default) cidb 1 <- case insensitive …

Case Insensitive Databases – Create/Import • Create a new case-insensitive database • Use NLSCASE INSENSITIVE at the end of create database command. Example (case-insensitive): create database cidb in mydbs1 with log nlscase insensitive; • If the NLSCASE option is omitted, the database is created as NLSCASE SENSITIVE (case-sensitive, as traditionally) by default: • Example (case-sensitive): create database csdb in dbs1 nlscase sensitive ; create database csdb in mydbspace with buffered log ; • Import a database as case-insensitive • Use –ci option of dbimport • If –ci is omitted, database is created as case-sensitive (default) • Example (case insensitive): dbimport cidb -d mydbspace –ci • Example (case sensitive): dbimport csdb -d mydbspace

Case Insensitive Databases – Behavior (1) • Case insensitive behavior applies only to all NCHAR and NVARCHAR data types in case insensitive databases: • All other string types continue to behave in a case-sensitive way. • Allows to implement case-insensitive in Informix with no performance or application code overhead. • All case versions of a string are treated as duplicate values: • Ex: IBM = ibm = Ibm = IBm = iBm = iBM = ibM = IbM • Case insensitivity affects: • Results of string comparison (e.g., =, in, like, matches) in query predicate filters and join evaluations. • Index creation, especially unique indexes, as now all case variants are considered the same value. • Aggregation, DISTINCT/UNIQUE, ORDER BY and GROUP BY clauses. • Fragment creation and elimination aspects in a partitioned table/index. • Column data distribution information collected by Update Statistics.

Case Insensitive Databases – Behavior (2) • All the constraints are enforced based on case insensitivity. • Data type casting of columns or literals affect query results: • A cast can force case-insensitive (::NCHAR/VARCHAR) or case-sensitive (::CHAR/VARCHAR/LVARCHAR) behavior. • Collation order and rules on the data set are applied on top of the result set from the case insensitive search. • Case insensitive database behaves the same at any Informix (MACH) cluster node.

Case Insensitive Databases – Behavior (3) • Distributed query must use DBs with same case-sensitivity. • Enterprise Replication and Flexible Grid allow to mix participant DBs that use different types of case- sensitivity: • DBA needs to care for consistent data dissemination/consolidation across the nodes of the grid/ER. • Case-insensitive property applies only to new databases: • dbexport/dbimport could be used to migrate existing case-sensitive databases into case-insensitive databases.

Case Insensitive Databases – Examples (1) • Creating a database as case-insensitive (Unicode example): create database caseins with log nlscase insensitive; • Creating a table mytab with two string columns: • cc: char (case sensitive) • ci: nchar (case-insensitive) create table mytab (cc char(5), ci nchar(5)); • Inserted and retrieved all data. Here are all values: select * from mytab; cc ci IBM IBM ibm ibm Ibm Ibm IBm IBm iBm iBm iBM iBM ibM ibM IbM IbM 8 row(s) retrieved.

Case Insensitive Databases – Examples (2) • Case sensitive searches using = on column cc: select * from mytab where cc = ‘ibm’; cc ci ibm ibm 1 row(s) retrieved. select * from mytab where cc = ‘IBM’; cc ci IBM IBM 1 row(s) retrieved. • Case insensitive searches using = on column ci: select * from mytab where ci = ‘ibm’; cc ci IBM IBM ibm ibm Ibm Ibm IBm IBm iBm iBm iBM iBM ibM ibM IbM IbM 8 row(s) retrieved.

Case Insensitive Databases – Examples (3) • Type casting can change the behavior of a search to make it behave as either case sensitive or case-insensitive. • Forcing a case-sensitive search on a case-insensitive column • Without the casting to char(5), results would be case-insensitive: select * from mytab where ci::char(5) = ‘IBM’; cc ci IBM IBM 1 row(s) retrieved.

Case Insensitive Databases – Examples (4) • Forcing a case-insensitive search on a case-sensitive column • Without the casting to nchar(5), results would be case-sensitive: select * from mytab where cc = ‘IBM’::nchar(5); cc ci IBM IBM ibm ibm Ibm Ibm IBm IBm iBm iBm iBM iBM ibM ibM IbM IbM 8 row(s) retrieved.

Case Insensitive Databases – Examples (5) • DISTINCT/UNIQUE behavior on case-sensitive: select distinct cc from mytab; cc IBM IBm … ibM ibm 8 row(s) retrieved. • DISTINCT/UNIQUE behavior on case-insensitive: select distinct ci from mytab; ci Ibm 1 row(s) retrieved.

Case Insensitive Databases – Examples (6) • Creating unique index on the case-sensitive column: create unique index unqidx_cc on mytab(cc); Index created. • Trying to create unique index on the case-insensitive column: create unique index unqidx_ci on mytab(ci); 371: Cannot create unique index on column with duplicate data. 100: ISAM error:duplicate value for a record with unique key. • Creating duplicate index on the case-insensitive column: create index dupidx_ci on mytab(ci); Index created.

Improved Basic Text Search (BTS) queries • Informix 11.70.xC2 addresses all these requests to expand BTS functionality: • BTS support for search of Chinese, Japanese, or Korean text. • Ability to configure BTS searches to find homonyms, synonyms, word stems, or other specific requirements. • Ability to customize BTS stopword lists per column/field. • Ability to query individual columns in a bts composite index. • Ability to increase the maximum number of query results. • Several of these enhancements come thanks to a new more open platform for BTS searches available in this latest release.

BTS Analyzer (1) • An analyzer is a text search implementation (a search engine) • determines how the text is indexed and searched. • Until now, Informix BTS supported only an implementation of the CLucene open source analyzer for indexing and searching: • This is known as the standard analyzer in BTS. • In 11.70.xC2, Informix changed BTS to make it more flexible • BTS now supports multiple analyzers other than the standard. • BTS now supports specifying an analyzer (search engine) per column. • This means: • BTS can meet users’ custom text search challenges/needs. • BTS can be used to implement different search engines per-column.

BTS Analyzer (2) • There are several analyzers available (standard + others) • Some are combination of more basic ones: • standard • cjk[.ws] • whitespace • simple • stopword • keyword • snowball[.lang] • soundex • esoundex • udr • You can also create your own analyzer.

BTS Analyzer (3) • You can indicate the analyzer of your preference to process the input text, at the column-level, when creating a bts index. • Use the analyzer parameter of the “USING bts” clause for this. • Syntax: • analyzer=an_analyzer analyzer=”(an_analyzer[,field1:an_ • analyzer1[,field2:an_analyzer2[,...]]])” • analyzer=”file:fn” • analyzer=”table:tab.col”

Chinese, Japanese and Korean (CJK support) • The CJK analyzer in BTS processes Chinese, Japanese, and Korean text, allowing users to index and search string data and documents in repositories using these languages. • In order to use the CJK analyzer in BTS: • specify analyzer=”cjk[.ws]” when creating the bts index on a column.

Support for custom homonyms and word stems • Homonyms: • The Soundex analyzer (analyzer=”soundex”) indexes words by sound. • Allows to index and search words based on a phonetic algorithm that finds similar words as pronounced in English. • Word stems: • Words are stemmed. • Example: fishing, fished, fish, fisher, all have the root word: “fish”. • Most full text users want the option of making the search engine automatically look for all stems of the root words. • The snowball analyzer (analyzer=”snowball[.language]”) indexes word stems. • The Snowball stemmer analyzer is language and codeset specific: • If a language is not specified in the analyzer parameter, Informix will use the default language given by DB_LOCALE: DB_LOCALE=language_territory.code_set@modifier

New Features - Informix 11.70.xC2 Technical Overview