Relational Efficiencies: Part I

Relational Efficiencies: Part I Renee Teatro Information Builders

Relational EfficienciesAgenda • Optimization Overview • JOINs • Sorting • Aggregation • Expressions • Direct SQL Passthru

Relational EfficienciesLayers of Processing User executes a FOCUS request... • FOCUS calls the appropriate module (TABLE, GRAPH) • Reads and parses the MFD • Parses the request • Calls the Interface with information onthe MFD and the request available inwork areas • Reads and parses the AFD • Checks the module specifics for optimization • Analyzes and optimizes the request • Generates SELECT statement(s) • Passes SELECT statementsto the appropriate ‘physical’ module • Prepares, allocates, and opens cursor(s) • Calls the RDBMS to FETCH rows of data • Analyzes and optimizes the SQL translation of the FOCUS application request • Chooses the appropriate access pathand retrieval method • Retrieves data • Creates the answer set FOCUS / WebFOCUS Report is displayed • FOCUS reads a row from the answer set and processes the remaining actions on that row (IF/WHERE, DEFINEs…) • Puts the valid row into the Internal Matrix • Reads the next row and repeats the process until the end of the answer set • Processes the Internal Matrix and displays the report • The Interface receives a row and/or an SQL status code • Converts non-standard data into FOCUS format, making it available to FOCUS • Asks RDBMS for the next row (FETCH), until the end of the answer set is reached (SQL status code +100) • Sends back a row and/or an SQL status code to the Interface TABLE GRAPH TABLEF FOCUS FOCUS Reporting Modules DEFINE MATCH FILE FRL JOIN ANALYSE Data Adapter SQL Generation and Execution (‘logical layer’ – GNTINT ) Data Adapter SQL modules (‘physical layer’) Teradata DBTFOC Oracle ORAFOC DB2 SQL/DS RRSET Oracle Teradata DB2 SQL/DS DATA

Relational EfficienciesRDBMS Optimization Interface optimization is the degree to which a TABLE request is translated to SQL In other words, the process in which the interface translates projection, selection, JOIN, sort, and aggregation operations of a report request into its SQL equivalent and passes it to the RDBMS for processing • TABLE, MODIFY, MAINTAIN  SQL • Direct SQL Passthru • Passing JOINs

Relational EfficienciesThe Optimization Command * DB2/SQLDSSQLDBCSQLORA SQLMSS ONOFFSQLFOCUS TSOCMS SQL OPTIMIZATIONSQLJOIN SET OUTER Components Description Target Database Engine DB2 / SQLDS for DB2 SQLDBC for Teradata SQLORA for Oracle SQLMSS for SQL Server

Relational EfficienciesOptimization Settings In the SQL engine SET OPTIMIZATION ON is the default, so the interface attempts to fully optimize all requests • RDBMS is preferred engine for processing • More RDBMS processing is usually beneficial Types of optimization • ON – default • OFF – Lets FOCUS handle all JOINs, sorts, and aggregations • SQL – pass SQL even if multiplicative effect found • FOCUS – pass SQL only if results are identical to FOCUS processing

Relational EfficienciesData Adapter TRACE Facility • SET TRACEON=component//destination • Component • SQLDI - FSTRACE - All Interface-RDBMS activity • SQLAGGR - FSTRACE3 - Optimization messages • STMTRACE - FSTRACE4 - SQL only • SQLCALL - commands and data exchange between the physical and the logical layers of the data adapter • Destination • FSTRACE - allocation for the ddname of FSTRACE • CLIENT - displays client session to the screen

Relational EfficienciesXRETRIEVAL Option SET XRETRIEVAL=[ON | OFF] • ON – the interface sends the request to the RDBMS and it processes the request • OFF – the interface attempts to optimize the request, but no RDBMS processing is done

Relational EfficienciesFully Optimized Query SET TRACEOFF=ALL SET TRACEUSER=CLIENT SET TRACEON=STMTRACE//FSTRACE TABLE FILE EMPLOYEE COUNT EMP_ID BY DEPARTMENT WHERE CURR_SAL GT 10000 END STMTRACE: SELECT DEPARTMENT, COUNT(*) FROM EMPLOYEE WHERE (CURR_SAL > 10000) GROUP BY DEPARTMENT ORDER BY DEPARTMENT;

Relational Efficiencies Non-Optimized Query DEFINE FILE EMPDB2 CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH'; END TABLE FILE EMPDB2 SUM CSAL CATEGORY BY EID END SQLAGGR: (FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON: (FOC2597) USE OF DEFINE FIELD THAT CANNOT BE AGGREGATED: CATEGORY STMTRACE: SELECT T1.EID,T1.CSAL FROM "PMSSAE"."EMPINFO" T1 ORDER BY T1.EID FOR FETCH ONLY;

Relational EfficienciesNow Optimized Query SET TRACEOFF=ALL SET TRACEUSER=CLIENT SET TRACEON=SQLAGGR//CLIENT SET TRACEON=STMTRACE//CLIENT TABLE FILE EMPDB2 SUM CSAL COMPUTE CATEGORY/A4=IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH'; BY EID END SQLAGGR: AGGREGATION DONE ... STMTRACE: SELECT T1.EID, SUM(T1.CSAL) FROM "PMSSAE"."EMPINFO" T1 GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;

Projection and Selection

Relational EfficienciesProjection Projection is the retrieval of specific columns only • Projection is always optimized • Interface retrieves columns referenced in • Print/sum/count commands • Objects of JOINs and DEFINE statements • PRINT * and SEG.fieldname • Will return all columns in master file only • SELECT * never produced TABLE FILE EMPLOYEE PRINT * END SELECT EID, LN, FN FROM EMPLOYEE;

Relational EfficienciesProjection A master file can be considered a dynamic RDBMS view. A master can contain: • One or more columns of a relational table • Multiple relational tables – called an embedded MFD • Real relational views Main advantages of a master file • SQL JOIN syntax hidden from user • View not stored in RDBMS catalog • Activation of only necessary tables (segments) • Note: Not the case with dynamic JOIN

Relational EfficienciesSelection: Translatable Screening Type of Expressions Expression Components Examples Arithmetic-Valued Expressions(Expressions that return a single number) • Real fields of datatype I, P, D, or F • Numeric constants • Arithmetic operators (+, -, *, / ) • Aggregation operators (SUM., CNT., AVE., MIN., MAX.) WHERE TOTAL (AVE.CSAL * 0.10) +AVE.CSAL GT 55000; Character String-Valued Expressions(Expressions that return a character string) • Real fields of datatype A • String constants • Concatenation operator (I) • EDIT of alphanumeric fields WHERE EDIT(FN, ‘9.$’) |LN EQ ‘J.WANG’ ; • Real fields with any FOCUSdatatype • Constants of consistent datatype • Relational operator(EQ, NE, GT, LE ..) • Logical operators (AND, OR, NOT) • Valued expression operands Logical Expressions(Expressions that return a single value, True or False) WHERE (CDIV EQ ‘CORP’ OR ‘NE’) AND ((CSAL* 0.10) +CSAL GT 55000); Screening conditions on DEFINEd fields, which calculate the above type of expressions, arepassed to the RDBMS. NOTE

Relational Efficiencies Non-Translatable Screening Conditions Example Expressions using DEFINE... FNL/I3 = ARGLEN(15,LN,FNL);TABLE... IF FNL LE 6 User-written subroutines DEFINE... DIVISION/A11=IF CDIV EQ ‘CORP’ THEN ‘CORPORATE’ ELSE IF CDIV EQ ‘NE’ THEN ‘NORTH-EAST’ ELSE ‘NA’;TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST” IF–THEN–ELSE expressions*** optimized Self-referential expressions DEFINE... CPT/I2=CPT+1;TABLE... IF CPT NE 0 WHERE EDIT(ID) GT 20 EDIT for field format conversions DEFINE... NAME/A27=FN||(‘ ‘ | LN);TABLE... IF NAME EQ ‘DANIEL VALINO’ Strong concatenation (II) DEFINE... DEVISION/A11=DECODE CDIV (‘CORP’ ‘CORPORATE’ ‘NE’ ‘NORTH-EAST’ ELSE ‘NA’);TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST’ DECODE function Non-SQL relational operators(INCLUDES, EXCLUDES) IF LN INCLUDES ‘VALINO’ FOCUS subroutines(ABS, INT, MAX, MIN, LOG, SQRT) WHERE SQRT(CSAL) GT 260 Expressions using fields withACTUAL=DATE DEFINE... HDAT2/YYMD=HDAT+365;TABLE... IF HDAT2 GT ‘1990/03/01’

JOIN Processing

Relational EfficienciesJOIN Optimization • Interface attempts to generate ONE SELECT statement to JOIN all tables • Applies to dynamic or embedded JOINs • One OPEN cursor operation • JOIN optimized more readily by RDBMS • An optimized JOIN enables sorts and aggregations to be passed • Limits interface <==> RDBMS communications

Relational EfficienciesJOIN Optimization • If JOIN is not passed to RDBMS • Termed: FOCUS-managed JOIN • One SELECT statement for EACH table • FOCUS executes a nested loop JOIN • Parent table (HOST) is the outer table • One inner table OPEN for each row returned from the outer table • Outer (host) table – choose the one with fewest rows returned • Sorts and aggregations are not passed • SQLAGGR/FSTRACE3 displays reason

Relational EfficienciesOptimized JOIN vs. Non-Optimized JOIN JOIN F1SSN IN TABLE1 TO F2SSN IN TABLE2 TABLE FILE TABLE1 PRINT F2SSN END Optimized JOIN: SELECT T1.F1SSN,T2.F2SSN FROM "PMSNJC".TABLE1 T1, PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = T1.F1SSN) FOR FETCH ONLY; Non-Optimized JOIN: (FOC2510) FOCUS-MANAGED JOIN SELECTED FOR FOLLOWING REASON(S): SELECT T1.F1SSN FROM "PMSNJC".TABLE1 T1 FOR FETCH ONLY; SELECT T2.F2SSN FROM PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = ?) FOR FETCH ONLY;

Relational EfficienciesSpecial JOINs In earlier releases, these types of JOINs disabled optimization: • Multiplicative effect encountered for aggregated requests • Termed: Interface-managed native JOIN • Check results, FOCUS managed may be more efficient (SET OPTIMIZATION=OFF) • Outer JOIN (SET ALL=ON) • Missing cross-referenced rows are processed • RDBMS specific syntax in SQL SELECT statement • SQL sqlengine SET SQLJOIN OUTER OFF|ON • SET ALL=PASS not supported • WHERE field EQ ‘$*’ OR field IS-MISSING • Create HOLD files/JOIN/SET ALL=PASS • Heterogeneous JOIN • Differing file types (e.g., flat file, IMS, etc.)

Relational EfficienciesJOIN Considerations When a JOIN is not passed to RDBMS, make sure: • The KEYS= parameter is defined correctly • The JOIN command (unique or non-unique) corresponds to the AFD KEYS= parameter Some other considerations: • (Over) normalized vs. non-normalized data • Ensure referenced tables on same retrieval path • Consider use of indices • If Interface optimization is disabled, consider choice of parent table, use of HOLD files • JOIN on same data type and length

Sort Processing

Relational EfficienciesSort Optimization FOCUS BY/ACROSS translated to SQL ORDER BY Translating sort phrases (BY/ACROSS) to SQL is important • Relational sort is usually more efficient • RDBMS uses indices • Sort enables RDBMS to perform aggregation • FOCUS retrieves the answer set in sorted order • Reduced I/O since answer set is aggregated

Relational Efficiencies Sort Optimization FOCUS sort phrases are NOT translated to SQL and Optimization is disabled when: • Optimization was set OFF by user • JOINs were not passed to RDBMS (and consequently Optimization was disabled by the interface) • A FOCUS sort phrase uses an FRL command: BY field ROWS value1 OVER value2... FOR field ROWS value1 OVER value2… FOCUS sort phrases are not fully translated to SQL and aggregation and optimization is automatically disabled when: • FOCUS BY/ACROSS…IN-GROUPS-OF is requested

Relational EfficienciesSort Optimization To get FOCUS Sort phrases translated to SQL • Sort on real fields & use COMPUTEs instead of DEFINEs • Sorts on most DEFINEd fields are now optimized • Use SQLAGGR/STMTRACE to evaluate if DEFINE fields are being translated. If not, reformulate if possible • With FST. and LST. ensure access file KEYS and KEYORDER parameters are correct Considerations • Consider indexes on sort objects • SET OPTIMIZATION OFF/TABLEF/External Sort • Consider using TABLEF if sort is passed

Relational EfficienciesUsing TABLEF Use TABLEF when all FOCUS sort phrases are translated to SQL • Faster than TABLE • Does not generate an internal matrix (FOCSORT) • Eliminates FOCUS sorting You cannot use TABLEF when FOCUS has to process some of the sorting with • ACROSS • Direct operators requiring the FOCUS internal matrix (TOT., PCT., or RPCT.) • COMPUTE expressions using direct operators • Multi-verb requests • RETYPE Note: Locks are held with TABLEF until report is complete (commit issued)

Aggregation Processing

Relational EfficienciesEfficient Aggregation FOCUS SQL SELECT SUM(...) GROUP BY column ORDER BY column SUM ..., WRITE ... BY field SUM., CNT., MIN., MAX., AVE. SUM(...), COUNT(*), MIN(...), MAX(...), AVG(...) Aggregation translation is important • RDBMS aggregation is more efficient: indices • An aggregated answer set reduces FOCUS-to-RDBMS communication • A smaller answer set reduces FOCUS local processing

Relational EfficienciesTranslatable Aggregation Type of expressions Expression components Examples Arithmetic Valued(Expressions that return a single number) DEFINE FILE ORAEMP NEW_SAL/D12.2=(CSAL * 0.10) + CSAL ; END • Real fields of datatype I, P, D, or F • Numeric constants • Arithmetic operators (+, -, *, / ) • Real fields of datatype A • String constants • Concatenation operator (I) • EDIT of alphanumeric fields Character String Valued(Expressions that return a character string) DEFINE FILE ORAEMP NAME/A18=EDIT(FN,‘9.$’)|LN; END • Verbs: SUM, COUNT, WRITE • Direct operators: MIN., MAX., AVE. • Aggregating DEFINEd fields: • Constant DEFINEd fields translated with CNT. The following defined expressions can be translated

Relational EfficienciesNon-Translatable Aggregation If the verbs PRINT or LIST are used, no aggregation is requested and FSTRACE3 returns the following message: (FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON: (FOC2594) AGGREGATION IS NOT APPLICABLE TO THE VERB USED Aggregation is not translated to SQL and optimization is automatically disabled when: • Optimization was set off by user • JOINs were not passed to RDBMS (and consequently optimization was disabled by the interface) • FOCUS sort phrase is not translated • Some screening conditions not passed to RDBMS • Some non-SQL operators are used • Multi-verb requests • COUNT with MISSING=ON NOTE

Relational EfficienciesAggregation Considerations • Possible index-only processing • Possibly aggregate in RDBMS index • Explicit or implicit (e.g., in heading/footing) FST. and LST. can be optimized using MIN and MAX • Aggregate on real fields • Use COMPUTE in place of DEFINE • Create aggregated extract files (HOLD files) in cases where aggregation is not optimized

Virtual Field Processing

Relational EfficienciesVirtual Field (DEFINE) Optimization • DEFINE fields can be optimized as part of aggregation or record selection • Aggregation or record selection can optimize: • Arithmetic-valued expressions • Character string-valued expressions • Logical expressions (selection only) • Aggregation cannot be optimized for logical expressions • Single segment DEFINEs passed when JOIN is not • IF-THEN-ELSE DEFINEs capable of being passed

Relational EfficienciesIF-THEN-ELSE DEFINE Example – Optimized DEFINE FILE EMPINFO SAL_FLAG = IF (CURRENT_SALARY LT 10000) AND (DEPARTMENT_CD EQ 'MIS') THEN 1 ELSE 0; END TABLE FILE EMPINFO PRINT EMP_ID LAST_NAME FIRST_NAME IF SAL_FLAG EQ 1 END STMTRACE: SELECT T1.EID,T1.LN,T1.FN, T1.DEPARTMENT_CD,T1.CURRENT_SALARY FROM "USER1"."EMPINFO" T1 WHERE ((((T1.CURRENT_SALARY < 10000) AND (T1.DEPARTMENT_CD = 'MIS')))) FOR FETCH ONLY;

Relational EfficienciesAggregation DEFINE Example – Optimized DEFINE FILE EMPDB2 CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH'; CATEGORY1/I4 = IF CSAL LT 10000 THEN 0 ELSE 1 ; CATEGORY2/D10 = CSAL * 1.3; END TABLE FILE EMPDB2 SUM CSAL CATEGORY2 BY EID END STMTRACE: SELECT T1.EID, SUM(T1.CSAL), SUM((T1.CSAL * 1.3)) FROM "PMSSAE"."EMPINFO" T1 GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;

Relational EfficienciesSort Expression Example – Optimized • Aggregation by Expression • Allows named expression to be used in ORDER BY clause • DEFINE FILE DB2FILE TABLE FILE DB2FILE • TAX = 0.08 * PRICE SUM PRICE TAX • END BY TAXNOPRINT AGGREGATION DONE ... SELECT SK001, SUM(VB001), SUM(VB002) FROM (SELECT (.08 * T1.PRICE) AS SK001,T1.PRICE AS VB001,(.08 * T1.PRICE) AS VB002 FROM USER.DB2FILE T1 ) X GROUP BY SK001 ORDER BY SK001 FOR FETCH ONLY; In the past: (FOC2597) USE OF DEFINED FIELD THAT CANNOT BE AGGREGATED :TAX

Relational EfficienciesDirect SQL Passthru Best of both worlds • If the most efficient SQL is not generated or • Optimized SQL code already exists • SQL sqlengine {any valid SQL statement} END SQL DB2 SELECT C.CLIENT_ID,J.CLIENT_ID, C.CASE_NO,J.REST FROM CLIENT C, CLIENTJ J WHERE C.CLIENT_ID=J.CLIENT_ID; TABLE FILE SQLOUT PRINT * ON TABLE HOLD END SET SQLENGINE=SQLORA SQL PREPARE result FOR SELECT * FROM DQAORA01; TABLE FILE result PRINT F1SSN ON TABLE HOLD AS HOLD1 END TABLE FILE HOLD1 PRINT F1SSN END

Thank You !

Relational Efficiencies: Part I

Relational Efficiencies: Part I

Presentation Transcript

I. relational theories (Platonic theories)

Relational Algebra : #I

Relational Efficiencies

Academic Efficiencies:

Efficiencies Report

Non-Relational Efficiencies

Chapter 4: Logical Database Design and the Relational Model ( Part I )

PRODUCTION EFFICIENCIES

Relational Algebra, Principles and Part I

Part I I I

Part IV: Object-Relational Databases

Part I

Relational Algebra – Part 2

ENERGY EFFICIENCIES

Relational Database Design I

Part I

PART I - I

Operational Efficiencies

Efficiencies

Implementation of Relational Operations (Part 2)

Relational Algebra, Principles and Part I