1 / 59

Preparing for High Availability

This article covers the planning considerations, installation issues, and maintenance issues when preparing for high availability. Topics include budgeting, hardware choices, software inventory, progress versions, database layout, after imaging, and personnel planning.

growell
Télécharger la présentation

Preparing for High Availability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preparing for High Availability Adam Backman adam@wss.com V.P. of Technology White Star Software

  2. What We Will Cover • Planning considerations • Installation issues • Maintenance issues

  3. Planning Phase - People • Who “owns” the data • Be inclusive • This is not solely an IT decision • Eliminate surprises

  4. Planning Considerations • Budget – high availability is not free • Hardware – fault tolerant, redundancy, … • Software – Progress is good but how is your “other” software? • Knowledge – buy or rent • Time – schedule and outage time • Personnel constraints – Who is on call?

  5. Goals During Outage • Do no additional damage • Shortest amount of time • Reduce/Eliminate impact to customer

  6. The Cost of Downtime • Wages • Idle workers • Cost to replace data • Production • Lost production • Impact to the customer • Can’t click website • Can’t place order

  7. How Much Downtime Can You Afford? • For maintenance • Application • Database • For failures • Hardware • Software • Natural disaster

  8. Planning Phase - Budget • Less downtime = additional cost • Better disks (RAID, Mirrors, EMC, …) • Redundant system • Remote site • More money does not equal less downtime • Prioritize • Look for most likely scenarios • Look beyond cool

  9. Planning Phase - Hardware • Disks – The only moving part • RAID – Redundant Array Inexpensive Disks • Avoid software mirroring • Use multiple controllers • Try to stick with a 1 vendor solution

  10. What RAID really means RAID has many levels, here are the most common • RAID 0: This level is also called striping. • RAID 1: This is referred to as mirroring. • RAID 5: Poor performance RAID level • RAID 10: This is mirroring and striping. Also known as RAID 0 + 1

  11. Planning Phase - Hardware • CPU Check with vendor to ensure fault tolerance • Memory Do not interleave memory • Vendor Choose a reliable vendor (IBM, HP, Sun, Compaq, …)

  12. Planning Phase - Hardware • Other hardware • File servers • Network stuff (LAN & WAN) • Phone/Internet connections

  13. Planning Phase - Software • Inventory all software (client and server) and make sure it is current and supported • Determine what software is needed all of the time (Production control – Yes, Reporting software – No)

  14. Planning Phase - Progress • Version of Progress (look for patches) • Layout of database • Single database or Multi-database • Storage area layout (logical and physical layout) • Application issues • Client/Server, N-Tier or Host based • Where does the application code reside?

  15. Planning Database Layout • Single database • Easy to maintain • Still have storage areas to spread data • Single point of failure • Multi-database • More to maintain • Allows application partitioning • Maintenance flexibility • Two phase commit

  16. After Imaging • Before image files keep information about records giving you the ability to undo a transaction • After image files keep information about records that allows you to redo a transaction in the event of media failure • After imaging is only part of a high availability strategy

  17. After Imaging • Every high availability system should have after imaging enabled • Multiple after image areas are required for high availability • Only enable after imaging after you have a comprehensive backup and recovery plan in place

  18. How Does Journaling Work? Here is an logical over-simplification of how journaling works FOR EACH customer: BI Note written UPDATE customer. AI Note written END.

  19. Planning Phase - Knowledge • Own Our people have the knowledge to do the project • Buy We can train our people to do this project • Rent We will hire consultants to implement this for us (Insert shameless plug here)

  20. Planning Phase - Time Schedule for project • Machine purchase and delivery • Software availability • Resource availability • Do we need a long weekend for implementation? Timings determined later may determine implementation schedule items

  21. Planning Phase - Personnel • 24 hr. Operators If you don’t have operators you will need to develop monitoring routines with paging ability • Database Administrator(s) • System Administrator(s) Develop an escalation plan with “on call” schedule for off hours issues

  22. Installation Phase • All items should have been already developed and tested prior to this stage • All items should have been already developed and tested prior to this stage • All items should have been already developed and tested prior to this stage • Get the point?

  23. Installation Steps • Develop a schedule with timings and leave room for error as there WILL be errors • Write scripts to do tasks where possible to eliminate the human factor • Have a master checklist with the person/ people responsible for each item

  24. Maintenance Goals • Provide consistent performance • Allow to advanced planning • Avoid unscheduled outages

  25. Maintenance • Don’t design something you cannot support • Scripting should be flexible but bulletproof • Example: www.peg.com/utilities.html • Monitoring and trending are very important to maintain high availability systems

  26. Monitoring Areas of concern for high availability • Progress • Database areas filling • BI not being reused • AI space depleted • Running out of licenses • System • Disk space • Resources (memory, CPU, tunables, …)

  27. Monitoring Progress - DB /* Storage Area fill rate program */ DEF VAR percent-free as DEC FORMAT ">9.99". FOR EACH _AreaStatus: percent-free = 100 - ((_AreaStatus-HiWater / _AreaStatus-TotBlocks * 100)). DISPLAY _AreaStatus-areaname "Percent Free:" percent-free .

  28. Monitoring Progress - BI /* Last BI file growth program */ DEF VAR t_filename AS c FORMAT "x(40)". t_filename = pdbname(1) + ".b". FIND LAST _ActIOFile WHERE _IOFile-filename BEGINS t_filename. IF _IOfile-Extends = 0 THEN DISPLAY "ALL IS WELL". ELSE DISPLAY "The Sky is Falling !!!".

  29. Monitoring Progress - AI # Program: After image extent full checker FULL_EXT=`rfutil $DB -C aimage extent list | grep -i full | wc -l` if [ $FULL_EXT -lt 9 ] then echo “$DB has $FULL_EXT full extents STATUS – OK” else echo “WARNING - $DB has $FULL_EXT full extents” fi

  30. Monitoring Progress - Users /* License count tester */ DEF VAR remaining-licenses AS INT. FIND _license. remaining-licenses = _Lic-ValidUsers - _Lic-MaxActive. /* You may want to use _Lic-ActiveConns instead of _Lic-MaxActive */ IF .10 > (remaining-licenses / _Lic-ValidUsers) THEN DISPLAY "Less than 10% of licenses remaining" WITH FRAME X. ELSE DISPLAY "More than 10% of licenses remaining" WITH FRAME Y.

  31. System Monitoring • Disk Space • How much disk available for growth • Also look at throughput capacity (average wait) • Memory capacity • Free memory is not a good indicator • I focus on the scan rate • CPU Capacity • How much idle time

  32. Maintenance Tasks • Backup and restore • After imaging • Log based replication • Data maintenance

  33. Backup and Restore • Progress online backup • Quiet point backup • Warm standby backup

  34. Backup and Restore Why can’t I just backup the database and before image files while the database is at a slow point? Answer: The database consists of three portions while it is up and those are: The database files, the before image file(s) and memory

  35. Portions of an Active DB Shared memory holds the most volatile data The database contains older committed data The before image holds transaction information All three are needed for a complete backup Shared memory DB BI

  36. Online Backup What happens during an online backup? • Grab a db latch • Do a pseudo-checkpoint (this synchs memory to disk) • Switch AI file (if necessary) • Backup the before image file • Release the db latch • Backup the database (starting at the end)

  37. Quiet Points • Very little impact to system availability • Allows for integration with hardware utilities • Only way to get an online backup with an operating system utility without shutting down the broker

  38. How quiet points work. • Get database latch • do pseudo checkpoint • wait for quiet point to be removed NOTE: All processing will wait for the quiet point to be removed

  39. Quiet Point Backup How to do a quiet point backup • Enable the quiet point (This synchs memory to disk) • Synchronize your disk mirrors • Split your disk mirrors • Disable the quiet point • Mount the mirrors as different file systems • Backup your mounted mirrors with an OS utility (tar, cpio, fdump, …)

  40. After Imaging • Every high availability system should have after imaging enabled • Only enable after imaging after you have a comprehensive backup and recovery plan in place • AI is sometimes referred to as the redo log

  41. Multi-volume after image files • Not a backup but a journal of completed transactions • Can be used to keep a copy of the database up to date • Can be switched with no interruption to user processing • Should part of every high availability environment

  42. How to integrate after imaging • In conjunction with a backup site • To update a report server • As a means of backup

  43. AI to update a backup site • Poor man’s replication • Allows for periodic update of a copy of the database • The copy can then be backed up with a conventional backup mechanism

  44. Log Based Replication • Log based replication is another way to say applying AI files to a copy of your database • Excellent way to maintain a warm copy of your database for fail over • Can be used on the same machine or on a remote machine for additional protection

  45. Log Based Replication Rules • The standby database can only be accessed read-only (-RO) which means no remote (client/server) connections to the standby data • You must have a multi-volume AI. This is a must for high availability in any case • The standby database can have a different structure than the primary data

  46. AI as a Means of Backup • Not generally a good idea • Increased recovery time • Reduced reliability • Backup the database each weekend • Backup the AI file(s) each weeknight

  47. Backup – Points to Remember • Simplicity and minimizing user interaction will increase backup reliability • You are only as good as your last tested backup • Archiving off site is essential

  48. Database Maintenance • Data Stuff • Table move • Database analysis • Index Stuff • Index rebuild (offline) • Index Compress • Index Fix

  49. Table Move • Pros • Simple • Bullet proof • Cons • Slow • Table is read only for the duration of the move • Uses tons of logging space

  50. Table Move Syntax: proutil dbname –C tablemove tablename table-area [index-area] Table-area = The target application data area into which the table is to be moved Index-area = The name of the target index area, if not specified the indexes will be left in there existing location

More Related