Agenda

Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RACGrant McAlisterSenior Database EngineerAmazon.comPaper 32110

Agenda • Why Oracle on Linux and RAC • The Tests • Scaling • Performance • Availability • Choice of Interconnect • Conclusion

Why Linux • Lower Total Cost of Ownership • Near commodity hardware and support • Multiple O/S and hardware vendors • Common platform (IA-32) for entire enterprise • Unix look and feel • New enterprise kernel • No database conversions when changing Linux hardware or O/S

Why RAC on Linux • Cost • Ability to use near commodity systems (2-4 processors) • Lower level of support needed on system units • The need for availability • Young and rapidly evolving O/S • Near commodity hardware and support • The need to scale database beyond 8 processors • The need for large amounts of memory > 32GBytes

The Tests • Real life workloads • Not modified or partitioned to support RAC • Used automatic space management • Workload #1 • Simple workload of small queries with little locking. • Workload #2 • Typical nasty workload with many inserts, updates and select for updates causing a lot of locking and blocking.

Workload #1 Single Instance Profile Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 77,516.28 1,460.05 Logical reads: 4,134.57 77.88 Block changes: 462.54 8.71 Physical reads: 155.70 2.93 Physical writes: 27.14 0.51 User calls: 11,012.73 207.43 Parses: 432.50 8.15 Sorts: 187.32 3.53 Executes: 432.89 8.15 Transactions: 53.09 % Blocks changed per Read: 11.19 Recursive Call %: 0.68 Rollback per transaction %: 0.82 Rows per Sort: 353.26 Top 5 Wait Events on a single instance Avg Total Wait wait Waits Event Waits Timeouts Time (s) (ms) /txn ---------------------------- ------------ ---------- ---------- ------ -------- db file sequential read 560,060 0 1,249 2 2.9 log file sync 180,813 494 676 4 0.9 log file parallel write 188,017 181,946 143 1 1.0 latch free 87,584 6,309 141 2 0.5 db file parallel write 5,794 2,895 14 2 0.0

Workload #2 Single Instance Profile Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 244,988.60 5,306.31 Logical reads: 14,562.36 315.41 Block changes: 1,802.47 39.04 Physical reads: 319.45 6.92 Physical writes: 91.52 1.98 User calls: 2,877.06 62.32 Parses: 457.06 9.90 Sorts: 290.13 6.28 Executes: 456.73 9.89 Transactions: 46.17 % Blocks changed per Read: 12.38 Recursive Call %: 4.16 Rollback per transaction %: 0.96 Rows per Sort: 13.09 Top 5 wait events on a single instance Avg Total Wait wait Waits Event Waits Timeouts Time (s) (ms) /txn ---------------------------- ------------ ---------- ---------- ------ -------- db file sequential read 346,048 0 1,412 4 1.7 enqueue 177 119 369 2087 0.0 free buffer waits 752 32 348 463 0.0 db file scattered read 141,564 0 325 2 0.7 log file sync 207,109 37 306 1 1.0

The Hardware and Software • Software • Oracle 9.2.0.1 • Red Hat Advanced Server 2.1 (2.4.9-e.3) • Hardware • 3 types of clusters that each have 4 nodes • 2 Pentium III Xeon Processors @ 1.126GHz & 5 Gbytes of RAM • 2 Pentium 4 Xeon DP Processors @ 2.4GHz & 4 Gbytes of RAM • 4 Pentium 4 Xeon MP Processors @ 1.6GHz & 10 Gbytes of RAM • Database files were on raw partitions

Scaling • The ability to produce higher transactional volumes when adding additional processors or additional nodes.

Scaling of workload #1

Scaling of workload #2

Some workloads scale better

Some of the differences Top 5 workload #1 timed events Top 5 workload #2 timed events

Performance • The time taken to perform a query is important • Execution time influences transactional volume • Can cause dramatic changes in the end user response time • Stock Exchange • Internet Retailer • Bank • Only you know what is reasonable for your database and application

Execution times for workload #1

Execution times for workload #2

Some ways to improve • Make sure you database is well tuned for single instance operation • Consider using different block sizes for hot indexes • Hash partition hot tables and indexes • Partition the workload

Availability • Minimize failures by building clusters with as few single points of failure as possible. • Setup your RAC cluster to recover from node and instance failure as quickly as possible.

Redundant RAC Configuration

Instance recovery time fast_start_mttr_target is the key

Node failure recovery time • Recovery Time= Failure detection + Instance recovery • Failure detection = (MissCount * 1 second) • MissCount parameter in found in cmcfg.ora • When MissCount = 20 and fast_start_mttr_target=120 • All workload #2 processing resumed in less than 1 minute after crashing a node.

Impact of a single node failure Node failed Cluster Reconfigured CM ejects node Recovery Complete

Choice of Interconnect • 1000Mbit (Gigabit) Ethernet • Latency ~ 0.07 ms • Transfer Rate - 30+ MBytes per second • More expensive but becoming common with the advent of gigabit over copper. • 100Mbit Ethernet • Latency ~ 0.20 ms • Transfer Rate - 10 MBytes per second • Common and inexpensive

100mbit vs. Gigabit

Conclusions • RAC scaled at 90% on a simple workload • RAC scaled consistently at 55+% on a complex workload • There is an impact to query performance depending on your workload • You can recover from failures in less than 1 minute • When configured correctly a RAC cluster can scale, perform and be highly available.

Q & Q U E S T I O N S A N S W E R S A

Agenda

Agenda

Presentation Transcript

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda:

Agenda

Agenda

AGENDA