1 / 56

REP705: A Simple DR/HA Implementation for Replication Servers Based on NAS Filer

REP705: A Simple DR/HA Implementation for Replication Servers Based on NAS Filer. Tung Chen Senior DBA Tung.Chen@Barclaysglobal.com August 15-19, 2004. Agenda. What is the Problem We Are Solving What is a Filer (Network Based Device) System Architecture Overview

iokina
Télécharger la présentation

REP705: A Simple DR/HA Implementation for Replication Servers Based on NAS Filer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REP705: A Simple DR/HA Implementation for Replication Servers Based on NAS Filer Tung Chen Senior DBA Tung.Chen@Barclaysglobal.com August 15-19, 2004

  2. Agenda • What is the Problem We Are Solving • What is a Filer (Network Based Device) • System Architecture Overview • How to Build the DR System for Replication Server • Cross-site (DR) Failover Procedure • Local Failover Procedure • Performance Benchmark • Q&A

  3. What is the Problem We Are Solving? • Everybody knows how to use Warm Standby Replication to replicate ASE Databases from main site to DR site. But, what happens to the Rep Servers after the DR failover? • Here is the big picture for most of the DR implementation:

  4. R2DS.PRC_DB PDS.INV_DB RDS.INV_DB What is the Problem We Are Solving? • Before the DR Switch-over: San Francisco D.C. INV Usr Sacramento D.C. Rep Svr W.S. Subs Rep RSSD ASE WAN ~ 80 miles P2DS.PRC_DB W.S. PRC Usr

  5. R2DS.PRC_DB RDS.INV_DB INV Usr PRC Usr What is the Problem We Are Solving? • After the DR Switch-over: San Francisco D.C. Sacramento D.C. Rep Svr PDS.INV_DB W.S. Subs Rep RSSD ASE WAN (high spd) ~ 80 miles P2DS.PRC_DB W.S.

  6. PDS.INV_DB..t_a ..t_b RDS.INV_DB..t_a ..t_b R2DS.PRC_DB..t_a ..t_c What is the Problem We Are Solving? • A Logical View: Before W.S. Logical Entity.INV_DB t_a Logical Entity.PRC_DB P2DS.PRC_DB..t_a ..t_c W.S.

  7. RDS.INV_DB..t_a ..t_b R2DS.PRC_DB..t_a ..t_c What is the Problem We Are Solving? • A Logical View: After Logical Entity.INV_DB t_a(??) Logical Entity.PRC_DB

  8. Logical Entity.INV_DB RDS.INV_DB..t_a ..t_b R2DS.PRC_DB..t_a ..t_c t_a(??) Logical Entity.PRC_DB What is the Problem We Are Solving? • Once switched to DR site (DR failover), there is no Replication Server on the DR site. Only ASE databases are brought up on DR site. • Subscription replication (table t_a) from INV_DB to PRC_DB is lost – Going forward, there is no subs. replication

  9. Idea • If we mirror Rep Servers to the DR site, we can bring up Rep Servers on the mirror image at DR site

  10. What Constitute A Rep Server ? The following components are mirrored across the WAN for Rep Server DR Implementation: • Rep Server software directory (file system) • Rep Server stable devices (file system) • ASE Server software directory (file system): For RSSD • ASE Server Device files (file system)

  11. Network Appliance Filer Cluster • Our Solution: Network Appliance’s filer, or FAS (fabric-attached storage) series with SnapMirror software • FAS 900 and FAS 200 series support NFS and block-storage access through network • FAS960 cluster: the storage system is fully redundant, with RAID protection • FAS series is certified by Sybase as database devices • According to NetApp, FAS data availability is greater than 99.998%

  12. Network Appliance Filer Cluster • NetApp FAS 960 cluster on both data centers • Four heads in each cluster; each head has 9 trays; each trays provides 600 GB usable space after formatting • The filers are backed up using NDMP & Netbackup

  13. What is SnapMirror? • NetApp SnapMirror is a software which mirrors data from one filer to one or more network filers. • It continually updates the mirror data to keep it current and available for DR. • After the initial sync-up, it only updates the new and changed blocks incrementally over the network – reduced the network bandwidth requirement. • SnapMirror is based on IP • It has 3 levels of synchronicity: sync, semi-sync and async.

  14. What is SnapMirror? • We use async SnapMirror to replicate Primary-site filer cluster to DR site filer-cluster • SnapMirror is scheduled every 10 minutes for both directions (not on the same volume)

  15. SnapMirror Sacramento D.C. San Francisco D.C. WAN (high spd) ~ 80 miles SnapMirroring Filer Cluster Filer Cluster

  16. How Are Disk Volumes Organized/Mirrored? • Data on a NetApp filer is organized in Volumes • Each Volume has at least one RAID group. • We use 8-disk RAID group. • Each volume has multiple Qtrees • Each Qtree hosts a user file system • In our implementation, we have 4 file systems on 4 Qtrees sharing one volume

  17. How Are Disk Volumes Organized/Mirrored? • Each Qtree inherits has security config, quota, etc • Qtree is our work unit for SnapMirroring and server failover, i.e. disk image for a Rep Server is contained in a Qtree • Each Qtree is snapMirrored to a mirror Qtree

  18. Failover Package • Rep Server needs an RSSD on ASE (Consider ERSSD 12.6) • We define the Failover Package as a Rep Server and RSSD ASE • There are 4 file systems on the primary filer cluster, snapMirrored to DR-site filer cluster: • File system for Rep Sever executables, libs, config files, etc – REP-12_5, OCS-12_5, locales, etc • File system for stable devices • File system for ASE executables, libs, config files, etc • File system for ASE data devices

  19. Failover Package/Filer Redundancy Sacramento D.C. San Francisco D.C. WAN (high spd) ~ 80 miles SnapMirroring Filer Cluster Filer Cluster

  20. HW Redundancy • With redundant HW standing by, failover can occur across the WAN for DR, or in the same data center for a local HA implementation • Stand-by HW does not have to be idle before failover • Need to ensure at the time of failover, the target host will have appropriate • CPU resources • memory resources, • mount points for NFS mounts

  21. DR/HA System Host A Rep Svr Sacramento D.C. San Francisco D.C. RSSD ASE WAN (high spd) Local Failover Target DR Failover Target ~ 80 miles Host B Host C SnapMirroring Filer Cluster B Filer Cluster A

  22. DR Failover Host A Sacramento D.C. San Francisco D.C. Rep Svr WAN (high spd) Local Failover Target ~ 80 miles RSSD ASE Host B Host C SnapMirroring Filer Cluster B Filer Cluster A

  23. Local Failover Host A Sacramento D.C. San Francisco D.C. Rep Svr WAN (high spd) DR Failover Target ~ 80 miles RSSD ASE Host C Host B SnapMirroring Filer Cluster Filer Cluster B Filer Cluster A

  24. How to Build the DR System? Host A . 4 FSs . $HOME Sacramento D.C. San Francisco D.C. WAN (high spd) Local Failover Target DR Failover Target ~ 80 miles Host B Host C SnapMirroring Filer Cluster B Filer Cluster A

  25. How to Build the DR System? 1. (Unix SA) Create volume and Qtrees on the primary filer cluster A: % create volume nas_rep % Qtree create nas_rep % Qtree security /vol/nas_rep Unix 2. (Unix SA) Export the FS on primary filer Edit the ‘exports’ file with the following lines /vol/nas_rep/hostA -access=admin:,root=hostA % exportfs -a

  26. How to Build the DR System? 3. (Unix SA) Mount FS on host A Edit the /etc/vfstab file on hostA to include the following lines filerA:/vol/nas_rep/hostA - /nfsapps/hostA/sybase/apps/syb-RDC_NAS nfs - yes rw,bg,hard,intr,rsize=32768,wsize=32768,proto=tcp,forcedirectio,noac,vers=3

  27. How to Build the DR System? 4. (Unix SA) Create volume on host C (cluster filer B) and schedule Qtrees for snapMirroring from filer A to filer B All following commands are issued in DR site filer/host % create volume vol2 Change volumes to restrict: % vol restrict vol2 Edit snapmirror.conf file on target filer B with lines: ## Source filer:/volume/qtree Destination filer:/volume/qtree rate-of-transfer filerA:/vol/nas_rep/hostA filerB:/vol/vol2/hostC kbs=3072, restart=always 0-59/10 * * * % snapmirror initialize /vol/vol2/hostC Note: As long as Qtrees are replicated, we cannot write into the target volume/Qtrees

  28. How to Build the DR System? 5. (Unix SA) Configure 4 file systems on filer cluster A and mounts them on primary host A: (2 GB each) • /apps/RDC_NAS  /nfsapps/hostA/sybase/apps/syb-RDC_NAS • /apps/RDC_NAS_RS  /nfsapps/hostA/sybase/apps/rep-RDC_NAS_RS • /data/RDC_NAS  /nfsapps/hostA/sybase/data/syb-RDC_NAS • /data/RDC_NAS_RS  /nfsapps/hostA/sybase/data/rep-RDC_NAS_RS

  29. How to Build the DR System? Host A . 4 FSs . $HOME Sacramento D.C. San Francisco D.C. WAN (high spd) Local Failover Target DR Failover Target ~ 80 miles Host B Host C SnapMirroring Filer Cluster B Filer Cluster A

  30. How to Build the DR System? 6. (DBA) Build RSSD ASE Server RDC_NAS on host A using the file systems /apps/RDC_NAS and /data/RDC_NAS 7. (DBA) Build Rep Server RDC_NAS_RS on host A using the file systems /apps/RDC_NAS_RS and /data/RDC_NAS_RS

  31. PDS.INV_DB DR Failover Target How to Build the DR System? San Francisco D.C. INV Usr Sacramento D.C. Host A Rep Svr RDS.INV_DB RSSD ASE WAN (high spd) Local Failover Target ~ 80 miles Host B Host C R2DS.PRC_DB P2DS.PRC_DB SnapMirroring PRC Usr Filer Cluster A Filer Cluster B

  32. How to Build the DR System? 8. (DBA) Use rs_init to build WS replication for PDS.INV_DB and P2DS.PRC_DB. Rep Defs are based on logical connection 9. (DBA) Build subscription replication based on logical connections

  33. PDS.INV_DB RDS.INV_DB R2DS.PRC_DB DR Failover Target San Francisco D.C. INV Usr Sacramento D.C. Host A Rep Svr W.S. Subs Rep RSSD ASE WAN (high spd) Local Failover Target ~ 80 miles Host B Host C P2DS.PRC_DB W.S. SnapMirroring PRC Usr Filer Cluster A Filer Cluster B DR/HA System

  34. W.S. PDS.INV_DB..t_a ..t_b RDS.INV_DB..t_a ..t_b R2DS.PRC_DB..t_a ..t_c Logical Entity.INV_DB t_a Logical Entity.PRC_DB P2DS.PRC_DB..t_a ..t_c W.S. DR/HA System • Note: It is important that all subscription replications are based on logical connections!

  35. PDS.INV_DB..t_a ..t_b RDS.INV_DB..t_a ..t_b R2DS.PRC_DB..t_a ..t_c DR/HA System • Each W.S. Replication swtich-over can occur independently W.S. Logical Entity.INV_DB t_a Logical Entity.PRC_DB P2DS.PRC_DB..t_a ..t_c W.S.

  36. DR Failover Procedure Failover from Sacramento D.C. to S.F. D.C. • System is running nominally,.. when the disaster strikes: Sacramento D.C. is gone; hosts A & B are gone

  37. PDS.INV_DB RDS.INV_DB R2DS.PRC_DB DR Failover Target DR Failover Procedure Sacramento D.C. San Francisco D.C. INV Usr Host A Rep Svr W.S. Subs Rep RSSD ASE WAN (high spd) Local Failover Target ~ 80 miles Host B Host C P2DS.PRC_DB W.S. SnapMirroring PRC Usr Filer Cluster A Filer Cluster B

  38. DR Failover Procedure Failover from Sacramento D.C. to S.F. D.C. 1. (Unix SA) Break the snapMirroring from filer cluster A to filer cluster B. Assuming data updates and replication are continuous, filer B may be 10 minutes behind filer A % snapmirror quiesce /vol/nas_rep/hostA % snapmirror break /vol/nas_rep/hostA Note: Once this is done, all target volume and Qtrees become writable 2. (Unix SA) Mount the 4 file systems on host C (filer cluster B)

  39. RDS.INV_DB R2DS.PRC_DB DR Failover Target DR Failover Procedure San Francisco D.C. W.S. WAN (high spd) ~ 80 miles Host C W.S. SnapMirroring Filer Cluster A Filer Cluster B

  40. RDS.INV_DB R2DS.PRC_DB DR Failover Target DR Failover Procedure San Francisco D.C. W.S. WAN (high spd) ~ 80 miles Host C W.S. SnapMirroring Filer Cluster A Filer Cluster B

  41. DR Failover Procedure 3. (DBA) Modify ‘interfaces’ file on host C: Update the IP for Rep Server and RSSD ASE Server. Push out new interfaces file to S.F. server hosts only

  42. DR Failover Procedure Before: ## Server: RDC_NAS ## hostA (49.38.100.22) port 16117 RDC_NAS 9 9 query tli tcp /dev/tcp \x000237257777777d0000000000000000 master tli tcp /dev/tcp \x000237257777777d0000000000000000 ## Server: RDC_NAS_BS ## hostA (49.38.100.22) port 16127 RDC_NAS_BS 9 9 query tli tcp /dev/tcp \x0002372f7777777d0000000000000000 master tli tcp /dev/tcp \x0002372f7777777d0000000000000000 ## Server: RDC_NAS_RS ## hostA (49.38.100.22) port 16157 RDC_NAS_RS 9 9 query tli tcp /dev/tcp \x0003333d7777777d0000000000000000 master tli tcp /dev/tcp \x0003333d7777777d0000000000000000

  43. DR Failover Procedure After: ## Server: RDC_NAS ## hostC (48.37.102.21) port 16117 RDC_NAS 9 9 query tli tcp /dev/tcp \x000237255555555d0000000000000000 master tli tcp /dev/tcp \x000237255555555d0000000000000000 ## Server: RDC_NAS_BS ## hostC (48.37.102.21) port 16127 RDC_NAS_BS 9 9 query tli tcp /dev/tcp \x0002372f5555555d0000000000000000 master tli tcp /dev/tcp \x0002372f5555555d0000000000000000 ## Server: RDC_NAS_RS ## hostC (48.37.102.21) port 16157 RDC_NAS_RS 9 9 query tli tcp /dev/tcp \x0003333d5555555d0000000000000000 master tli tcp /dev/tcp \x0003333d5555555d0000000000000000

  44. DR Failover Procedure 4. (DBA) Bring up RSSD ASE Server RDC_NAS on host C – on mirror image 5. (DBA) Bring Rep Server RDC_NAS on host C – on mirror image

  45. RDS.INV_DB R2DS.PRC_DB Rep Svr RSSD ASE Host C DR Failover Procedure San Francisco D.C. W.S. WAN (high spd) ~ 80 miles Host C W.S. SnapMirroring Filer Cluster A Filer Cluster B

  46. DR Failover Procedure 6. (DBA) Perform WS switch-over for databases INV_DB and PRC_DB. New primaries are RDS.INV_DB and R2DS.INV_DB 7. (DBA) Both DBs in ‘dbo use only’ mode; Reconcile lost data on PRC_DB and INV_DB 8. (DBA) Open up S.F. DBs; Push out new interfaces file to all client machines

  47. RDS.INV_DB R2DS.PRC_DB Rep Svr RSSD ASE Host C DR Failover Procedure San Francisco D.C. W.S. WAN (high spd) ~ 80 miles W.S. SnapMirroring Filer Cluster B

  48. RDS.INV_DB R2DS.PRC_DB INV Usr Rep Svr RSSD ASE Host C PRC Usr DR Failover Procedure San Francisco D.C. W.S. WAN (high spd) ~ 80 miles W.S. SnapMirroring Filer Cluster B

  49. DR Failover Procedure Notes: • Depending on the time of the crash/disaster, the mirror image may be 10 minutes behind the primary image for Rep Server stable queue. Data loss is/should be expected due to snapMirror delay • Crash test confirmed data loss in some cases, but we are always able to bring up Rep Server and RSSD ASE Server on the mirror image on DR site • Test on Filer Cluster shows a reliable and transparent failover in the filer cluster

  50. Local Failover • Local Failover on filer is very useful and much simpler • Stay on the same filer and same data center -- No need to break SnapMirror or do anything on filers -- No need to perform WS switch-over on Rep Server • Only need to mount the FSs on new target host B • There will be no data loss on Rep server stable queues ( because primary filer cluster remains up )

More Related