1 / 61

S311441 Practical Performance Management for Oracle Real Application Clusters

S311441 Practical Performance Management for Oracle Real Application Clusters. Michael Zoll, Consulting Member of Technical Staff Barb Lundhild, Product Manager, Oracle Real Application Clusters.

tacita
Télécharger la présentation

S311441 Practical Performance Management for Oracle Real Application Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. S311441 Practical Performance Management for Oracle Real Application Clusters Michael Zoll, Consulting Member of Technical Staff Barb Lundhild, Product Manager, Oracle Real Application Clusters

  2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

  3. Agenda <Insert Picture Here> • Oracle RAC Infrastructure and Technical Fundamentals • Application and Database Design • Common Problems and Symptoms • Diagnostics and Problem Determination • Appendix

  4. Objective • Convey a few simple and fundamental concepts of Oracle RAC performance • Summarize application level performance and scalability information • Provide some simple sizing hints • Give exemplary overview of common problems and solutions • Builds on similar presentation from OOW 2008 http://www.oracle.com/technology/products/database/clustering/pdf/s298716_oow2008_perf.pdf

  5. <Insert Picture Here> Oracle RAC Infrastructure: Technical Fundamentals, Sizing and Configuration

  6. Oracle RAC Architecture /…/ public network VIPn VIP2 VIP1 Service Service Service Node 2 Node n Listener Node1 Listener Listener SCAN_Listener SCAN_Listener SCAN_Listener instance 1 instance 2 instance n ASM ASM ASM Oracle Clusterware Oracle Clusterware Oracle Clusterware Operating System Operating System Operating System shared storage Redo / Archive logs all instances Database / Control files Managed by ASM OCR and Voting Disks

  7. Log buffer Buffer Cache Global Cache and Global Enqueue ServiceProcesses and Functions SGA Runs in Real Time Priority Library Cache Dictionary Cache Global Resource Directory Global Enqueue Service Global Cache Service Oracle Process Oracle Process LMD0 LMON LMSx DBW0 LGWR Cluster Private High Speed Network

  8. Legend: Immediate direct send: > 96% Post 2 3 Log write and send : < 4% LMS : Buffer Cache 5 Post 6 Send 1 4 Flush redo Global Cache Access LGWR Shadow process: Receive

  9. Basic Performance Facts • Global Cache access is 100 - 500 usecs ( roundtrip ) • Data immediately served from remote instances via private, high speed interconnect • Redo may have to be written to log file before send if data was changed and has not been committed yet • Performance varies with network infrastructure and network protocol • Maximum network hops is 3 messages • For clusters with more than 2 nodes, independent of total cluster size • CPU cost per OLTP transaction • Dependent on locality of access , I.E. messages per tx

  10. Basic Performance Facts: Latency (UDP/GbE and RDS/IB ) Lower CPU cost relative to protocols and network infrastructure • Actual interconnect latency is generally not the problem unless you have exceeded capacity or you are experiencing errors

  11. Private Interconnect • Network between the nodes of an Oracle RAC cluster MUST be private/dedicated to traffic between Oracle RAC nodes • Large ( Jumbo ) Frames for GbE recommended • Avoids fragmentation and reassembly ( 8K / 1500 MTU = 6 fragments ) • Interconnect bandwidth should be tested with non-Oracle utilities ( e.g. iPerf ) • No packet loss at 75% - 80% of bandwidth utilization

  12. Interconnect Bandwidth • Generally, 1Gb/sec sufficient for performance and scalability in OLTP. • DSS/DW systems should be designed with > 1Gb/sec capacity categorically • Prediction of interconnect traffic is difficult • Depends on transaction instruction length per message • Empirical rule of thumb: 1Gb/sec per 32 CPU Cores • Infiniband and 10GbE are supported for scale-out

  13. <Insert Picture Here> Performance and Scalability of Applications and Database Design with RAC

  14. General Scalability • Scaling OLTP workloads, DML intensive • Scale well, if contention is little and database/working set size scales ( I.E. add node when demand grows) • Read intensive workloads scale predictably and linearly • Bigger cache when adding more nodes • Faster read access to global cache than to disk, less disk IO • If cluster-size and database size growth are balanced, system will perform and scale well

  15. Performance and Scaling in Application and Database Design Response Time Impact • Index contention on INSERTS when index is right-growing • system generated “artificial” keys such as consecutive order numbers or “natural” keys such as dates • UPDATES or DELETES to rows in a small working set • Session logging and tracking • First-in first-out queues • State of messages in queues • Bulk INSERTS of large amounts of data • LOBS

  16. DML Contention and Serialization Modification intensive operations on small set of ( cached) blocks “busy blocks” “busy blocks” Table T Table T’ Index I Index I …… …… INSERT INTO I WHERE Key = sequence UPDATE T SET … WHERE row in blocks[1..n] and n is a small number

  17. Performance and Scaling in Application and Database Design CPU Cost due to Inter-Instance Messaging and non-linear scaling • In-memory databases • Working set spans multiple buffer caches • Frequent modifications and reads of recent modifications • Working set fits into memory of one instance • Locality of access worsens when node are added and users are load balanced • Scale as long as sufficient CPU power is available

  18. Read-intensive Buffer Cache 32GB Buffer Cache 32GB Cache Transfer Disk Transfer Read Read Working Set on Disk 64GB …… …… Eventually all blocks cached, Larger read cache No messages in 11g

  19. Performance and Scalability • Good linear or near-linear scaling out of box • IO and CPU intensive applications with large working sets and low proximity of access • Self-Service Web Applications ( Shopping Carts etc. ) • CRM • Document storage and retrieval • Business Analytics and Data Warehousing

  20. Performance and Scalability • Partitioning or load direction may optimize performance • High proximity of access , e.g. adding and removing from message queues • Advanced Queuing and Workflow • Batch and bulk processes • Order processing and Inventory • Payroll processing

  21. Identifying Performance and Scaling Bottlenecks in Database Design The Golden Rules: • #1:For first approximation, disregard read-mostly objects and focus on the INSERT, UPDATE and DELETE intensive indexes and tablespace • #2:If DML access to data is random, no worries if CPU is not an issue • #3:Standard SQL and schema tuning solves > 80% of performance problems. There is usually only a few problem SQL and Tables. • #4:Almost everything can be scaled out quickly with load-direction and load balancing

  22. Identifying Performance and Scaling Bottlenecks in Database Design • Look for indexes with right-growing characteristics • Keys comprising DATE columns or keys generated by sequence numbers • Find frequent updates of “small” and compact tables • “small”=fits into a single buffer cache • Identify frequently and concurrently modified LOBs

  23. HOW ? • Look at segment and SQL statistics in the Automatic Workload Repository • Use Oracle Enterprise Manager Access Advisories and Automatic Database Diagnostics Monitor (ADDM) • Instrumentation with MODULE and ACTION helps identify and quantify components of the workload

  24. Quick FixesWithout modifying Application • Indexes with right-growing characteristics • Cache sequence numbers per instance • Hash or range partition table with LOCAL indexes • Frequent updates of “small” and compact tables • Reduce block size ( 2K ) and row density of blocks (PCTFREE 99 ) • Frequently modified LOBS • Hash partitions ( 128 – 256 ) • FREE POOLS

  25. Quick Fixes • Application Modules which may not scale or cannot be quickly reorganized can be directed to particular nodes via cluster managed services • For Administrator Managed and older releases create service with 1 preferred node and the rest available • For Policy Managed databases use a singleton service • Some large scale and high performance applications may be optimized by Data Partitioning ( range, hash, or composites) and routing per partitioning key in application server tier • E.g. hash by CLIENT_ID, REGION etc.

  26. RAC Database 30% Work I’m busy Pool Instance1 10% Work Application I’m very busy I’m idle Instance2 60% Work Instance3 Leverage Connection Pools UCP: Load Balancing and Affinity

  27. Performance and Scalability Enhancements in 11.1 and 11.2 • Read Mostly • Automatic policy detects read and disk IO intensive tables • No interconnect messages when policy kicks in -> CPU savings • Direct reads for large ( serial and parallel ) scans • No locks , no buffer cache contention • Good when IO subsystem is fast or IO processing is offloaded to storage caches or servers ( e.g. Exadata ) • Fusion Compression • Reduces message sizes and therefore CPU cost • Dynamic policies to make trade-off between disk IO and global cache transfers

  28. <Insert Picture Here> Performance Diagnostics and Checks: Metrics and Method

  29. Normal Behaviour <Insert Picture Here> • It is normal to see time consumed in • CPU • Db file sequential/scattered read • Direct read • Gc cr/current block 2-way/3-way ( Transfer from remote cache ) • Gc cr/current grant 2-way ( Correlates with buffered disk IOs ) • Average latencies should be within baseline parameters • Most problems boil down to CPU, IO, network capacity or applications issues

  30. AVG Waits Time(s) (ms) %Time db file sequential read 2,627,295 21,808 8 43.2 % CPU time 9,156 18.2 20.8% Normality, Baselines and Significance Most significant response time component gc current block 3-way 3,289,371 4,019 1 8.0 gc buffer busy acquire 373,777 3,272 9 6.5 gc current block 2-way 3,982,284 3,192 1 6.3 gc current block busy 125,595 2,931 4 GC waits are influenced by interconnect or remote effects which are not always obvious Contention Avg < 1 ms

  31. Distributed Cause and Effect Example: Cluster-wide Impact of a Log File IO Problem Node 2 Node 2 ROOT CAUSE Node 1 Node 1 Disk Capacity Disk or Controller Bottleneck

  32. Global Metrics View WORKLOAD REPOSITORY report for Instance OOW8 Host oowdb8 Local Symptom Event: gc current block busy 23ms WORKLOAD REPOSITORY report for Instance OOW4 Host oowdb4 Log file paralel write 20ms Cause Global Cache Transfer Stats Avg global cache current block flush time (ms): 21 ms Inst # Busy % 4 data block 114,426 95.9 24.1 Remote instance table

  33. Global Cache Transfer Stats Inst Block Blocks % % No Class Received Immed Busy 4 data block 114,426 95.9 4.1 7 data bloc 162,630 76.6 23.4 Log file IO Avg global cache current block flush time (ms): 3.7 Investigate Serialization gc buffer busy 9 ms Waits for gc current block busy 4 ms Not OK!

  34. Example: Segment Statistics Segments by Global Cache Buffer Busy ES_BILLING TABLE 97.41 % Segments by Current Blocks Received ES_BILLING TABLE 85.81 % ANALYSIS: TABLE ES_BILLING is frequently read and modified on all nodes. The majority of global cache accesses and serialization can be attributed to this .

  35. Comprehensive Cluster-wide Analysis via Global ADDM Courtesy of Cecilia Gervasio, Oracle Server Technologies, Diagnostics and Manageability

  36. <Insert Picture Here> Common Problems and Symptoms

  37. Common Problems and Symptoms <Insert Picture Here> • Interconnect or Switch Problems • Slow or bottlenecked disks • High Log file Sync Latency • System load and scheduling

  38. Symptoms of Interconnect Problems Serialization High latencies Capacity Limit Congestion Dropped Packets ROOT CAUSE

  39. Symptoms of an Interconnect Problem Top 5 Timed EventsAvg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s)(ms) Time Wait Class ---------------------------------------------------------------------------------------------------- log file sync 286,038 49,872 174 41.7 Commit gc buffer busy 177,315 29,021 164 24.3 Cluster gc cr block busy 110,348 5,703 52 4.8 Cluster gc cr block lost 4,272 4,953 1159 4.1 Cluster cr request retry 6,316 4,668 739 3.9 Other Should never be here Always a severe performance problem

  40. UDP: Packet receive errors Socket buffer overflows • IP: • 1201 Fragments dropped after timeout • Reassembly failure • Incoming packets discarded TX errors:135 dropped: overruns: RX errors: 0 dropped:27 overruns: Interconnect or IPC problems Applications: Oracle gc blocks lost Protocol processing:IP,UDP netstat –s Device Drivers Ifconfig -a NIC2 NIC1 Ports Queues Switch

  41. Cluster-wide Impact of a Database File IO Problem ROOT CAUSE Node 2 Node 1 Disk Capacity Disk or Controller Bottleneck IO intensive Queries

  42. Cluster-Wide Disk I/O Impact Node 1 Top 5 Timed EventsAvg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s)(ms) Time ------------------------------ ------------ ----------- ------ ------ log file sync 286,038 49,872 174 41.7 gc buffer busy 177,315 29,021 164 24.3 gc cr block busy 110,348 5,703 52 4.8 `` CAUSE: Expensive Query in Node 2 Causes IO bottleneck Node 2 1. IO on disk group containing redo logs is slow Load Profile ~~~~~~~~~~~~ Per Second --------------- Redo size: 40,982.21 Logical reads: 81,652.41 Physical reads: 51,193.37 2. Block shipping for frequently modified blocks is delayed by log flush IO 3. Serialization builds up

  43. Log File Sync Latency: Causes and Symptoms Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery

  44. Causes of High Commit Latency • Symptom of Slow Log Writes • I/O service time spike may last only seconds or minutes • Threshold-based warning message in LGWR trace file • “Warning: log write elapsed time xx ms, size xxKB” • Dumped when write latency >= 500ms • Large log_buffer makes a bad situation worse. • Fixes • Smooth out log file IO on primary system and standby redo apply I/O pattern • Primary and Standby storage subsystem should be configured for peaks • Apply bug fixes in appendix Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery

  45. Block Server Process Busy or Starved Node 2 Node 1 ROOT CAUSE Too few LMSs LMS not in High Prio Memory Problems ( Swapping)

  46. Block Server Process Busy or Starved Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ------------------------------ - ----------- ----------- ------ ------ ---------- gc cr grant congested 26,146 28,761 1100 39.1 Cluster gc current block congested 13,237 13,703 1035 18.6 Cluster gc cr grant 2-way 340,281 12,810 38 17.4 Cluster gc current block 2-way 119,098 4,276 36 5.8 Cluster gc buffer busy 8,109 3,460 427 4.7 Cluster On remote note : Avg message sent queue time (ms): 16.1 “Congested” : LMS could not dequeue messages fast enough

  47. Block Server Processes Busy • Increase # LMS based on • Occurrence of “congested” wait events • Heuristics: 75 – 80 % busy is ok • Avg send q time > 1ms • Caveat: # of CPUs should always be >= # of LMS to avoid starvation • On NUMA architectures and CMT • Bind LMS to NUMA board or cores in processor set • Fence off Hardware interrupts from the processor sets

  48. High Latencies in Global Cache ROOT CAUSE

  49. <Insert Picture Here> Transient Problems and Hangs

More Related