1 / 60

Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux

Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux. Thomas Niewel Oracle Deutschland GmbH Thomas.Niewel@Oracle.com. Agenda. Tuning Why ? Reasons for bad Response Time Statspack Diagnosing reasons for bad response Times SQL Tuning TKPROF Explain Plan WLM.

rodd
Télécharger la présentation

Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monitoring and Tuning Oracle for z/OS andOracle for z/Linux

  2. Thomas NiewelOracle Deutschland GmbHThomas.Niewel@Oracle.com

  3. Agenda • Tuning Why ? • Reasons for bad Response Time • Statspack • Diagnosing reasons for bad response Times • SQL Tuning • TKPROF • Explain Plan • WLM

  4. Why do we need to tune ? • Users report „bad“ response times because of • CPU Time + Wait Time • Poor performing queries • SQL-Tuning • „bad“ Database parameters • Bottlenecks in „System“(Operating System, WLM, IO/Subsystem etc.)

  5. What can be the reasons for “bad” Response Time • High CPU Usage • High I/O Usage • Memory Usage • Network problems • „idle“ System • Operating System (WLM, VM)

  6. Diagnose from the Oracle point of view StatspackA short overview

  7. Statspack – a short overview • spcreate.sql - installs Statspack (run only once) • statspack.snap - data capture (procedure) • spreport.sql - reporting • spdoc.txt - user documentation • sppurge.sql - delete Statspack data • spdrop.sql - drop Statspack

  8. Capturing data • Prerequisite: timed_statistics=true • Use stored procedure statspack.snap SQL> execute statspack.snap;

  9. Capturing data • Get a baseline for future comparisons • Capture snapshots • across peak load • across batch window • The time between snapshots should be <= 30 minutes • Capture can be automated • Use OS utility e.g. cron • Use dbms_job • spauto.sql shipped as example

  10. Reporting with Statspack • All data is held in an Oracle database • Report between two or more snapshots • cannot report across instance startup • Spreport.sql creates a report

  11. Reporting with Statspack SQL> @spreport DB Id DB Name Instance# Instance ----------- ---------- ---------- ---------- 1361567071 DB21 1 MAIL Completed Snapshots Instance DB Name SnapId Snap Started Snap Level ---------- ---------- ------ ---------------------- ---------- DB21 DB21 1 17 Aug 2003 10:00:16 5 2 17 Aug 2003 10:30:28 5 Enter beginning Snap Id: 1 Enter ending Snap Id: 2 Enter name of output file [sp_1_2] : <enter name or return>

  12. Analyzing a Statspack report • Top down analysis • Summary page • Enviroment • Load profile • Instance efficiency • Shared pool usage • Top 5 Timed Events • Top SQL

  13. Environment section STATSPACK report for DB Name DB Id Instance Inst Num Release Cluster Host ------------ ----------- ------------ -------- ----------- ------- ------------ RECONPRD 1403107896 RECONPRD 1 9.2.0.2.0 NO lin390t1 Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- ------------------- Begin Snap: 2 03-Mar-03 11:28:01 10 5.1 End Snap: 31 04-Mar-03 11:58:04 17 5.5 Elapsed: 30.05 (mins) Cache Sizes (end) ~~~~~~~~~~~~~~~~~ Buffer Cache: 256M Std Block Size: 16K Shared Pool Size: 48M Log Buffer: 128K

  14. Load profile • Contains a number of common ratios • Allows characterisation of the application • Can point to problems • high hard parse rate • high IO rate • high login rate

  15. Load profile • Useful if you have a comparable baseline • What has changed? • txn/sec change implies changed workload • redo size/txn implies changed transaction mix • physical reads/txn implies changed SQL or plan

  16. Load profile Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 19,057.68 20,937.67 Logical reads: 2,408.15 2,645.70 Block changes: 98.64 108.37 Physical reads: 990.47 1,088.18 Physical writes: 6.92 7.61 User calls: 76.40 83.93 Parses: 7.08 7.78 Hard parses: 0.02 0.02 Sorts: 29.22 32.10 Logons: 24.73 27.17 Executes: 63.79 70.08 Transactions: 0.91 % Blocks changed per Read: 4.10 Recursive Call %: 72.76 Rollback per transaction %: 36.52 Rows per Sort: 153.46

  17. Instance Efficiency • Gives an overview of how the instance is performing • Can also be used with a comparable baseline • Shared pool Statistics allow quick identification of cursor sharing problems

  18. Instance Efficiency Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.99 Redo NoWait %: 99.97 Buffer Hit %: 59.00 In-memory Sort %: 99.99 Library Hit %: 99.94 Soft Parse %: 99.69 Execute to Parse %: 88.89 Latch Hit %: 99.98 Parse CPU to Parse Elapsd %: 56.55 % Non-Parse CPU: 99.93 Shared Pool Statistics Begin End ------ ------ Memory Usage %: 38.86 66.81 % SQL with executions>1: 43.41 87.22 % Memory for SQL w/exec>1: 39.28 80.21

  19. Top 5 Timed Events • CPU time – real work • Shows where Oracle sessions are waiting • Compare Wait Time to elapsed time • % Total Wait Time shows potential benefits • Use as basis for directed drilldown % Total Event Waits Time (s) Ela Time ------------------------------- ------------ ----------- -------- CPU time 78,588 50.24 enqueue 1,560,523 59,961 38.33 db file sequential read 1,635,253 6,324 4.04 db file scattered read 14,620,725 5,907 3.78 control file parallel write 32,816 1,396 .89

  20. Top 5 Timed Events • Sample drilldowns • CPU Time „on CPU“ • enqueue e.g TX Enqueue • db file sequential read Index Access • db file scattered read Scan Operationscontrol file parallel write

  21. Top SQL • Helps to find problem statements • SQL ordered by Gets • SQL ordered by Reads • SQL ordered by Executions • SQL ordered by Parse Calls

  22. Top SQL CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 79,562,398 8,114 9,805.6 34.6 27182.71 28127.71 1525844323 Module: SQL*Plus SELECT MAX(STMT_BKG_DATE_CLOSE) FROM GAH_T_STATEMENTS WHERE S TMT_ACCT_ID = :b1 AND ((:b2 = :b3 AND STMT_CARRIER != :b4 AND STMT_MSG_TYPE != :b5 AND (:b6 IS NULL OR :b6 = STMT_CARRIER ) AND ((:b8 IS NULL AND STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NU LL AND :b8 = STMT_MSG_TYPE ))) OR (:b2 = :b13 AND STMT_CARRIER

  23. I/O Statistics • Help to find I/O Problems • Tablespace IO Stats • File IO Stats

  24. I/O Statistics Tablespace ------------------------------ Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) -------------- ------- ------ ------- ------------ -------- ---------- ------ GAH_TS00_DT_MEDIUM 15,242,896 160 0.4 6.1 41,066 0 22,468 18.4 GAH_TS00_IX_ITEM 210,346 2 11.2 1.0 130,299 1 9 15.6 GAH_TS00_IX_MEDIUM 207,433 2 6.9 1.0 86,699 1 39 43.8 RECONPRD_TS00_TEMP 185,865 2 1.7 1.6 101,560 1 0 0.0 GAH_TS00_IX_ITEM_REF 155,027 2 8.4 1.0 34,867 0 1 0.0

  25. Diagnosing high CPU usage • High CPU Usage • High I/O utilization • Memory Usage • Network problems • „idle“ System • Operating System (WLM, VM)

  26. Diagnosing high CPU usage-Operating System- • Linux/390 • sar -u 3 3333 • iostat -x 3 • vmstat 3 • top • Etc. • Z/OS • SDSF • RMF • Omegamon • etc.

  27. Diagnosing high CPU usage • What can be the reason for „high CPU“ Usage ? • Shared_Pool / SQL-Cache • db_file_multiblock_read_count • Buffer_Cache/ Buffer_Pool • How can Statements with a great # of buffergets be seperated ? • Statspack • SQL Script

  28. Diagnosing high CPU usage CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 4,494,662 155 28,997.8 2.0 1049.63 2414.11 3961361411 SELECT * FROM GAH_T_STATEMENTS WHERE STMT_ACCT_ID = :b1 AND ((:b2 = :b3 AND STMT_CARRIER != :b4 AND STMT_MSG_TYPE != :b5 AND (:b6 IS NULL OR :b6 = STMT_CARRIER ) AND ((:b8 IS NULL AND STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NULL AND :b8 = STMT_MSG_ TYPE ))) OR (:b2 = :b13 AND STMT_CARRIER = :b14 AND STMT_MSG_T Module: SQL*Plus

  29. Diagnosing high CPU usage • spool cpu_users.lstselect buffer_gets,disk_reads,executions,ratio_to_report(buffer_gets) over () * 100 buffer_ratio,ratio_to_report(disk_reads) over () * 100 disk_ratio,sql_text from v$sqlareaorder by buffer_ratio desc;spool off

  30. Diagnosing high CPU usage • BUFFER_GETS DISK_READS EXECUTIONS BUFFER_RATIO DISK_RATIO • ----------- ---------- ---------- ------------ ---------- • SQL_TEXT • ---------------------------------------------------------------------------------------- • 19564429 154 46908 65.9945773 5.40350877 • select t.schema, t.name, t.flags, q.name from system.aq$_queue_tables t, ys.aq$_queue_table_affinities aft, system.aq$_queues q where aft.table_objno = t.objno and aft.owner_instance = :1 and q.table_objno = t.objno and q.usage = 0 and bitand(t.flags, 4+16+32+64+128+256) = 0 for update of t.name, aft.table_objno skip locked

  31. SQL Tuning • Check Object Statsitics • Use DBMS_STATS • Analyze Execution Plan • Explain Query / V$SQL_PLAN • Optimize Query • Optimize Indexes • Index Only Access, Function Based Indexes

  32. Diagnose • High CPU Usage • High I/O utilization • Memory Usage • Network problems • „idle“ System • Operating System (WLM, VM)

  33. High I/O utilization • Linux/390 • sar -d 3 33333 • iostat -x 3 • vmstat 3 • Z/OS • RMF • Omegamon etc

  34. High I/O utilization • Disk I/O • Disk access is slower than memory access (Factor 5000 to 100000) • One physical disk is able to perform 100-150 I/O´s per Second • Disk Reponse Times (Read operations) • 2ms (Read from disk cache) • 10ms – 15ms (Physical Reads)

  35. High I/O utilization • Reasons for High I/O utilization • Database Cache too small (DB_CACHE_SIZE) • Sortarea too small (sort_area_size) • Hasharea too small (hash_area_size) • Too many Checkpoints • Ineffective Execution Plans (e.g. Full-Table-Scans which are not necessary)

  36. High I/O utilization • Increase Cache Size • Reduces physical I/O Operations • Z/OS • Limited by 31 Bit Arcitecture • Multiple Adress Spaces help to improve the Memory management

  37. AS1 AS2 AS3 ASn Single Shared SGA Across Address Spaces High I/O utilization • An Oracle server instance has a single SGA regardless of the number of address spaces or regions configured. • The user context is distributed across all AS

  38. High I/O utilization • Linux/390 • The default maximum SGA size on Linux/390 is 750 MB without changing the base adress • the maximum SGA size to 1 GB by changing the SGA base address

  39. High I/O utilization Top 5 Timed Events % Total Event Waits Time (s) Wt Time -------------------------------------------- ------------ ------------ ------- db file sequential read 89,086,819 11,009 93.13 db file scattered read 9,875,076 776 6.56 file open 505,227 23 .19 log file sync 440,409 8 .07 latch free 11,042,510 3 .03

  40. High I/O utilization Tablespace I/O Stats: Tablespace Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) -------------- ------- ------ ------- ------------ -------- ---------- ------ RECEIVABLE_T_01 18,398,460 213 12.0 1.6 59,325 1 4,892,686 0.0 SO_T_03 6,827,475 79 13.2 1.6 27,462 0 4,506 0.0 SO_I_01 5,356,393 62 9.0 1.3 18,388 0 35,935 0.0 PO_I_01 4,641,732 140021.7 1.8 72,563 1 217,799 0.0

  41. High I/O utilization RMF Report (Monitor 1; RMF Postprocessor) D I R E C T A C C E S S D E V I C E A C T I V I T Y DEVICE AVG AVG AVG AVG AVG AVG AVG AVG % % % AVG % % STORAGE DEV DEVICE VOLUME LCU ACTIVITY RESP IOSQ DPB CUB DB PEND DISC CONN DEV DEV DEV NUMBER ANY MT GROUP NUM TYPE SERIAL RATE TIME TIME DLY DLY DLY TIME TIME TIME CONN UTIL RESV ALLOC ALLOC PEND DBORACLE 7651 33903 LEOR00 008F 0.817 4 0 0.0 0.0 0.0 0.2 2.6 0.8 0.06 0.28 0.0 1.0 100.0 0.0 DBORACLE 7652 33903 LEOR01 008F 0.878 9 0 0.0 0.0 0.0 0.2 0.3 8.7 0.76 0.79 0.0 3.0 100.0 0.0 DBORACLE 7653 33903 LEOR02 008F 0.502 2 0 0.0 0.0 0.0 0.2 0.0 1.5 0.08 0.08 0.0 6.0 100.0 0.0 DBORACLE 7654 33903 LEOR03 008F 108.968 56 52 0.0 0.0 0.0 0.2 2.4 0.8 0.08 0.32 0.0 1.0 100.0 0.0 DBORACLE 7655 33903 LEOR04 008F 0.828 3 0 0.0 0.0 0.0 0.2 2.3 0.8 0.06 0.25 0.0 1.0 100.0 0.0 DBORACLE 7656 33903 LEOR05 008F 98.779 50 48 0.0 0.0 0.0 0.2 1.7 0.8 0.13 0.42 0.0 1.0 100.0 0.0 DBORACLE 7657 33903 LEOR06 008F 2.768 2 0 0.0 0.0 0.0 0.3 1.3 0.7 0.20 0.56 0.0 1.0 100.0 0.0 DBORACLE 7658 33903 LEOR07 008F 0.943 3 0 0.0 0.0 0.0 0.2 2.3 0.7 0.07 0.28 0.0 1.0 100.0 0.0 DBORACLE 7659 33903 LEOR08 008F 1.003 4 0 0.0 0.0 0.0 0.2 3.5 0.8 0.08 0.43 0.0 1.0 100.0 0.0 DBORACLE 765A 33903 LEOR09 008F 0.945 3 0 0.0 0.0 0.0 0.2 2.2 0.8 0.07 0.28 0.0 1.0 100.0 0.0 DBORACLE 765B 33903 LEOR0A 008F 0.217 3 0 0.0 0.0 0.0 0.2 2.2 0.8 0.02 0.06 0.0 1.0 100.0 0.0 DBORACLE 765C 33903 LEOR0B 008F 0.833 4 0 0.0 0.0 0.0 0.2 2.5 0.8 0.06 0.28 0.0 2.0 100.0 0.0 DBORACLE 765D 33903 LEOR0C 008F 0.963 4 0 0.0 0.0 0.0 0.2 2.7 0.9 0.09 0.35 0.0 1.0 100.0 0.0 DBORACLE 765E 33903 LEOR0D 008F 0.013 3 0 0.0 0.0 0.0 0.2 2.6 0.5 0.00 0.00 0.0 1.0 100.0 0.0 DBORACLE 765F 33903 LEOR0E 008F 0.935 4 0 0.0 0.0 0.0 0.2 3.0 0.8 0.07 0.35 0.0 1.0 100.0 0.0

  42. High I/O utilization • RMF Report – Explanations • IOSQ TIME = UCB Queueing time • Avg Pend Time = ms, all Path´s to logical volume are busy • AVG Resp Time = Connect Time + Dicsonnect Time + Pending Time + IOSQ

  43. SQL Tuning • Check Object Statsitics • Use DBMS_STATS • Analyze Execution Plan • Explain Query • Optimize Query • Optimize Indexes • Index Only Access, Function Based Indexes

  44. Diagnose • High CPU Usage • High I/O utilization • Memory Usage • Network problems • „idle“ System • Operating System (WLM, VM)

  45. Memory Problems • How to determine Paging/Swapping • Linux/390 • VMSTAT • Z/OS • RMF • OMEGAMON • Reasons for Paging/Swapping • Too many processes/users • Database Parameters which are too generously • DB_CACHE_SIZE • HASH_SIZE • SQL_CACHE

  46. Diagnosing high CPU usage-Operating System- • High CPU Usage • High I/O utilization • Memory Usage • Network problems • „idle“ System • Operating System (WLM, VM)

  47. Diagnosing Network problems • Latency • LAN: < 1ms • WAN: < 10ms - 500ms • ISDN: < 50ms • VPN: 100-500 ms • Badwidth • 11-18 Mbit (Copper) • 100 Mbit (Copper, fibre) • 1 Gbit (fibre) • Great number of small packets • tcp_nodelay • SDU, TDU-Parameters (not available on z/os)

  48. Diagnosing high CPU usage-Operating System- • High CPU Usage • High I/O utilization • Memory Usage • Network problems • „idle“ System • Operating System (WLM, VM)

  49. Idle System • One CPU is 100% used – All other CPU´s are idle • Reason • dedicated Server • Only one process is running • Solution • Parallel Query • Not useful for OLTP Aplications • Split work - run more Processes

  50. Idle System • Latch Contentions • Use Statspack to diagnose • Enqueue Waits • Use Statspack to diagnose • Often Block Contentions because of too small initrans, Freelist, Freelist goup settings • Parsing because the use of Literals • Use Statspack to diagnose • Use CURSOR SHARING • Use Bind Variables

More Related