1 / 33

Varsity SQL for the Busy DBA

Varsity SQL for the Busy DBA. Fred Sobotka FRS Consulting, Inc. fred@frsconsulting.com. Stick around for…. How to spot the SQL shortcut for a seemingly tedious task Common table expressions, the foundation for versatile SQL that is structured, testable, and easily debugged

mills
Télécharger la présentation

Varsity SQL for the Busy DBA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Varsity SQL for the Busy DBA Fred Sobotka FRS Consulting, Inc. fred@frsconsulting.com

  2. Stick around for… • How to spot the SQL shortcut for a seemingly tedious task • Common table expressions, the foundation for versatile SQL that is structured, testable, and easily debugged • Replacing unreliable, hand-coded XML generators with ANSI-standard SQL/XML • Identifying the most useful parts of the system catalog • SQL functions and patterns that provide the most bang for the buck

  3. What does subsistence SQL smell like? • Application joins, high ratio of statements per transaction • Always with the temp tables (or perma-temp tables, even) • Hand-rolled programs that exist solely to generate XML • App anti-patterns that cleverly avoid SQL’s strength in sets • Poor filtering of rows, fetching a row only to discard it

  4. Which is worse - being asked these questions by developers: • How can I get this report to run faster? • What’s a more flexible alternative to EXPORT? • Can DB2 escape special characters for HTML & XML? • Is it possible to audit changes to a few specific tables? • Why aren’t I getting the same access plans in production? …or not being asked at all?

  5. Why might you as the DBA care about scruffy SQL? • Are long-running, inefficiently written report jobs crowding your batch window and cutting into your maint. time? • Have cumbersome ETL tools become a crutch that programmers rely on instead of coding smarter SQL? • Do developers swear up and down that they need you to build a replicated ODS just to handle some basic reports? • Do the problematic programs use cursors to iterate through data that is changing underneath it? • Is your manager hearing complaints that database performance is “slow” or “unreliable” when you know the primary cause is poor application design?

  6. One Possible Solution: Write Better SQL Queries • Strive to minimize the number of SQL roundtrips from app • Make user-defined temp tables the exception, not the rule • Keep joins inside the database, where they belong • Exploit DB2’s mature, built-in XML generator • Use EXPLAIN to identify which parts drive up the cost

  7. Common Table Expressions:Not just for recursive SQL you’ll rarely use • At the very least, CTEs can add meaning and structure • Often easier to read and follow than a single monster join WITH mdctables AS ( SELECT DISTINCT tabschema, tabname FROM syscat.indexesWHERE indextype = 'DIM' )SELECT m.tabschema, m.tabname, t.npages, t.fpagesFROM mdctables m INNER JOIN syscat.tables t ON t.tabschema = m.tabschema AND t.tabname = m.tabname ORDER BY t.fpages DESC; CTE Final SELECT

  8. More about CTEs • Introduced in SQL:1999 (a.k.a. SQL3) • Multiple CTEs can be declared & used in a single stmt • CTEs can reference each other, or themselves (recursion) • Some CTE support offered in Oracle, MS SQL, & all DB2 • CTEs don’t necessarily result in a worse access plan (EXPLAIN reveals that the statement is heavily re-written)

  9. Why not just create views instead? • If it’s a vendor app, you may not be allowed to add them • Views create hard dependencies that could complicate schema evolution or even block a DB2 upgrade • The query may be a one-off or an infrequent report/extract • Will the benefits be worth the team code review that new database objects may (should) require?

  10. Ever make stir-fry? • The real work is in the prep, not the little bit of stirring • Do you ever sample ingredients before throwing them in? • Each CTE is a like a prep bowl with an ingredient of data • Sample (SELECT), sniff (EXPLAIN) until you’re confident • Then uncomment the final SELECT at the bottom & cook

  11. Reasons to add more CTEs to your SQL diet • Testing/debugging a big stmt without disruptive changes • Grouping/aggregation can be combined w/ detailed rows • Targeted, precise row numbering & ranking capabilities • Separate the business logic and filtering from final output • Allows you to query the logical data model you really want, not the physical model you’re stuck with

  12. Test any CTE at just about any time Are we grabbing the orders we really want? There’s no need to wait until the query is completely coded to find out. WITH filteredorders AS (SELECT orderID, custID, saleDT, itemID, salesregion FROM sales.ordersWHERE referralsource IN ('WWW','EMAIL','FACEBOOK')) … --Orig. final SELECT is temporarily commented out SELECT COUNT(*), MIN(fo.saleDT), MAX(fo.saleDT), COUNT(DISTINCT fo.itemID), COUNT(DISTINCT fo.salesregion) FROM filteredorders fo;

  13. Easy switching between literals and parameters WITH parms( custID, saleDT ) AS ( SELECT-- While testing interactively, comment-- out parameter markers-- CAST( ? AS INTEGER ), CAST( ? AS DATE ) 999999, '2010-05-01' FROM sysibm.sysdummy1, The rest of your query still references parms.custID and parms.saleDT, and is not impacted when you switch back and forth between hardcoded literals and parameter markers.

  14. Make ROW_NUMBER() & RANK() work your way WITH ordertimeline AS ( SELECT custID, saleDT, ROWNUMBER() OVER ( PARTITION BT custID ORDER BY saleDT ASC ) orderseq FROM sales.ordersWHERE saleDT >= '2010-01-01 00:00:00' ) SELECT o1.custID, DATE( o1.saleDT ) saleDT1, DATE( o3.saleDT ) saleDT3, TIMESTAMPDIFF( 16, CHAR( o3.saleDT – o1.saleDT )) elapsedDaysFROM ordertimeline o1 INNER JOIN ordertimeline o3ON o3.custID = o1.custIDWHERE o1.orderseq = 1 AND o3.orderseq = 3ORDER BY elapsedDays ASC; Which customers have already racked up three or more orders this year? How many days elapsed between each customer’s 1st and 3rd order?

  15. Generating one-liners from SQL • Use CONCAT or || to intersperse text literals in output • Numerics will need to be cast as CHAR() to mix with text • Always RTRIM() trailing spaces off of schema names • Escape any apostrophe (') with a double apostrophe ('') • That’s two apostrophe characters, not a quote SELECT 'GRANT SELECT ON '||RTRIM( tabschema )||'.'||tabname||' TO ROLE READONLY;'FROM syscat.tables WHERE create_time > CURRENT TIMESTAMP – 2 DAYS; and another: SELECT 'ALTER TABLESPACE '||tbspace||' REBALANCE;' FROM syscat.tablespaces;

  16. Selecting and sorting are two different things • Your ORDER BY doesn’t need to reference SELECT cols • ORDER BY can be very sophisticated if necessary • “Re-order the alphabet” with a CASE expression in a CTE • When something absolutely, positively needs to follow an exact ordering sequence, DB2 SQL has got you covered

  17. Generating more complex code from SQL data • Build a query or CTE for each distinct section of the code • Initial preamble/heading, then repeating list of items, then a closer • Include a numeric literal in each result set for easy sorting • Define a second sorting column just for items in a list • Usually produced by a ROWNUMBER() expression in that query • Typical column layout for each query/CTE is: • Key/name, code fragment, major sort seq, minor sort seq • Glue them all together via UNION ALL & order the result by the PK/name, major sort seq, and the minor sort seq • But select just the column that contains the code fragments

  18. How can SQL generate sophisticated, valid code? obj_name code_fragment outer_sort inner_sort prod.sales CREATE VIEW… 0 0 UNION ALL prod.sales some_colname 1 0 prod.sales , other_colname 1 1 prod.sales , yet_another_col 1 2 UNION ALL prod.sales FROM x.tbl….. ; 2 0 Just be sure to ORDER BY obj_name, outer_sort ASC, inner_sort ASCbut don’t include them in your SELECT list!

  19. Generating complete CREATE VIEW statements WITH codelines AS (SELECT tabschema, tabname, 'CREATE VIEW NEWSCHEMA.' ||tabname||' AS SELECT ' AS code, 0 AS outersort, 0 AS innersort FROM syscat.tables -- THE HEADER UNION ALL SELECT tabschema, tabname, colname AS code, 1 AS outersort, ROWNUMBER() OVER (PARTITION BY tabschema, tabname ORDER BY colno) AS innersortFROM syscat.columns WHERE remarks <> ' ' --THE COLUMNS UNION ALL SELECT tabschema, tabname, 'FROM '||RTRIM(tabschema)||'.'||tabname||'; ' AS code, 2 AS outersort, 0 AS innersort FROM syscat.tables -- THE FOOTER) -- NOW FOR THE FINAL SELECT SELECT CASE WHEN ( innersort > 1 ) THEN ', ' || codeELSE code END AS code FROM codelinesWHERE tabschema in (…) AND tabname IN (…)ORDER BY tabschema, tabname, outersort, innersort;

  20. How to handle commas when generating lists • You have 2 choices: comma at end of line or at beginning • It’s a lot easier to find the first item than the last item • Don’t trust sequence numbers you didn’t generate • Their sequence number 1 could be duplicated, or missing entirely • So, generate your own sequence number for your list, and make sure you put a comma before items #2 and up … SELECT CASE WHEN ( innersort > 1 ) THEN ', ' || codeELSE code END AS code FROM …

  21. So what kind of code should we generate? • RUNSTATS/REORGCHK/REORG/REBIND • Recreate inoperative views • Grants and Revokes • Export & Import/Load scripts with business-specific twists • Extract stored procedures to separate, named files • DDL for change capture tables • Mass ATTACH/DETACH scripts for partitioned tables • One-at-a-time program calls for each primary key of user data in need of some sort of ETL or other processing

  22. Picking the least worst data for a report WITH allchoices ( itemid, dataitem, weightscore )AS ( SELECT itemkey, goodattr, 1 FROM best.optionsUNION ALLSELECT itemid, altval, 2 FROM notasgood.infoUNION ALLSELECT item_id, someguess, 3 FROM lastresort.hunch)SELECT a.itemid,b,c,( SELECT x.dataitem FROM allchoices x WHERE x.itemid = a.itemid ORDER BY weightscoreFETCH FIRST ROW ONLY ) AS itemstatus,… Think of how often you’ve seen people write entire programs full of subsistence SQL just to handle this.

  23. SQL/XML publishing functions • XML may not be in your database, but lots of people probably want XML out of your database • XML/SQL is a mature solution, introduced in SQL:2003 • Creates sophisticated, well-formed XML, including namespaces, from your tabular data, entirely from SQL • Does not require pureXML structures, XPath, or XSL • DB2 optimizer continues to improve at predicate pushdown for SQL/XML, yielding better access plans

  24. XMLSERIALIZE and XMLELEMENT • XMLSERIALIZE specifies the output data type returned • Next is typically the outermost (top-level) XMLELEMENT • Everything else is embedded within that element SELECT XMLSERIALIZE( XMLELEMENT(NAME "table", XMLELEMENT(NAME "schemaname", tabschema ), XMLELEMENT(NAME "tablename", tabname ) )AS VARCHAR( 255 ) )FROM syscat.tables; This returns the same number of rows as a traditional SELECT that doesn’t use SQL/XML.

  25. Suppressing empty XML tags with XMLFOREST • XMLELEMENT will always generate an XML tag, even when the payload is NULL <excuses></excuses> • If that isn’t what you want, use XMLFOREST, which omits the tag whenever the payload happens to be NULL • XMLFOREST is also handy for generating a series of sibling elements for a list of values SELECT XMLSERIALIZE( XMLELEMENT( NAME "table", XMLFOREST( tabschema AS "schemaname", tabname AS “tablename”, remarks AS "comment" ) ) AS VARCHAR( 400 ) )FROM syscat.tables;

  26. Collapsing repeating items into a child element • XMLAGG and XMLGROUP are used to squash potentially multiple rows into sibling elements • XMLGROUP is the more sophisticated of the two • Offers more features with less typing • You can do one XMLAGG in an expression, but not more • Cartesian products will occur if you don’t separate them • One XMLAGG containing five rows and another XMLAGG for three rows in the same SELECT will produce 15 elements inside each XMLAGG, which is probably not what you want.

  27. Example of unwanted Cartesian product WITH purchases( p ) AS ( VALUES 0 ), coupons( p, c ) AS ( VALUES( 0, 1 ), ( 0, 2 ),( 0, 3 ) ), purchaseditems( p, s ) AS ( VALUES( 0, 'A' ), ( 0, 'B' ), ( 0,'C' ), ( 0,'D' ), ( 0,'E' ) ) SELECT XMLSERIALIZE( XMLELEMENT( NAME "Purchase", XMLAGG( XMLELEMENT( NAME "CouponUsed", c ) ) , XMLAGG( XMLELEMENT( NAME "ItemBought", s ) ) ) AS VARCHAR( 2048 ) ) FROM purchases p LEFT OUTER JOIN coupons c ON p.p = c.p LEFT OUTER JOIN purchaseditems s ON p.p = s.p GROUP BY p.p; Although the GROUP BY collapses the final output to one row, the number of items produced by each XMLAGG is incorrect.

  28. Correct results when grouping is pushed to CTE WITH purchases( p ) AS ( VALUES 0 ), coupons( p, c ) AS ( VALUES( 0, 1 ), ( 0, 2 ),( 0, 3 ) ), purchaseditems( p, i ) AS ( VALUES( 0, 'A' ), ( 0, 'B' ), ( 0,'C' ), ( 0,'D'), ( 0,'E' ) ), cx( p, cxml ) AS (SELECT p, XMLAGG( XMLELEMENT( NAME "CouponUsed", c ) ) FROM coupons GROUP BY p ), ix(p, ixml ) AS (SELECT p, XMLAGG( XMLELEMENT( NAME "ItemBought", i ) ) FROM purchaseditems GROUP BY p ) SELECT XMLSERIALIZE( XMLELEMENT( NAME "Purchase", cxml , ixml ) AS VARCHAR( 2048 ) ) FROM purchases p LEFT OUTER JOIN cx ON cx.p = p.p LEFT OUTER JOIN ix ON ix.p = p.p ; The CTEs will return either one row or nothing, which eliminates the risk of a cross-product.

  29. Building the Perfect (XML) Beast • Build and test CTEs for major components/elements • The CTEs should handle any filtering and formatting • A second layer of CTEs collapses repeating children • It’s easier if the CTEs referenced by the final SELECT all return at most one row each • The final SELECT lays out the XML tags for all of the CTEs, which are connected via LEFT JOIN to the main table

  30. Varsity SQL isn’t necessarily complex SQL • Describe the ideal collection of datasets you’d like to see • Make the query do as much heavy lifting as possible • Assume the client app can’t or won’t refine the results • Favor a single moderately expensive query over hundreds or thousands of primitive SQL statements sent by an app • Think in stages: finding, filtering, grouping, and presenting

  31. Noteworthy SQL resources • DB2 SQL Cookbook by Graeme Birchall (free e-book) • Anything written by Joe Celko • DB2 XML Cookbook by Matthias Nicola and Pav Kumar-Chatterjee • Questions and answers posted to StackOverflow.com • DB2-L and IDUG Solutions Journal

  32. Just a few presenters to read and watch • DB2 z/OS optimizer: Terry Purcell • DB2 z/OS SQL: Sheryl Larsen • DB2 LUW optimizer: Calisto Zuzarte • DB2 LUW SQL: Serge Rielau

  33. Fred SobotkaIBM Gold Consultant – IBM Data ChampionFRS Consulting, Inc.fred@frsconsulting.com503-484-5032Skype/Twitter/AIM/Yahoo!: db2fred

More Related