1 / 75

Microsoft Exchange Server 2010 High Availability Deep Dive

Agenda. Exchange 2010 High Availability BasicsDeep Dive on Exchange 2010 High Availability BasicsDeeper Dive on Exchange 2010 High Availability Advanced FeaturesHigh Availability Improvements in Service Pack 1. Microsoft Exchange 2010 High Availability Basics. Database Availability Groups Mailbox

caden
Télécharger la présentation

Microsoft Exchange Server 2010 High Availability Deep Dive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Scott Schnoll Principal Technical Writer Microsoft Corporation Microsoft Exchange Server 2010 High Availability Deep Dive

    3. Agenda Exchange 2010 High Availability Basics Deep Dive on Exchange 2010 High Availability Basics Deeper Dive on Exchange 2010 High Availability Advanced Features High Availability Improvements in Service Pack 1

    4. Microsoft Exchange 2010 High Availability Basics Database Availability Groups Mailbox Database Copies Lagged Database Copies

    5. Database Availability Group (DAG)

    6. Database Availability Group (DAG) A group of servers that host a set of replicated mailbox databases Server can be a member of one DAG Orgs can have multiple DAGs Leverages Windows Failover Cluster technologies Manage DAG membership (DAG member = node) Heartbeating of DAG members Active Manager stores data in cluster database Defines a boundary for Mailbox database replication Database and server *overs Active Manager

    7. Mailbox Database Copies Create up to 16 copies of each mailbox database Each mailbox database must have a unique name within Organization Mailbox database objects are global configuration objects All mailbox database copies use the same GUID No longer connected to specific Mailbox servers

    8. Mailbox Database Copies Each DAG member can host only one copy of a given mailbox database Database path and log folder path for copy must be identical on all members Copies have settable properties Activation Preference RTM: Used as second sort key during best copy selection SP1: Used for distributing active databases; used as primary sorting key when using Lossless mount dial Replay Lag and Truncation Lag Using these features affects your storage design

    9. Lagged Database Copies A lagged copy is a passive database copy with a replay lag time greater than 0 Lagged copies are only for point-in-time protection, but they are not a replacement for point-in-time backups Logical corruption and/or mailbox deletion prevention scenarios Provide a maximum of 14 days protection When should you deploy a lagged copy? Useful only to mitigate a risk May not be needed if deploying a backup solution (e.g., DPM 2010) Lagged copies are not HA database copies Lagged copies should never be automatically activated by system Steps for manual activation documented at http://technet.microsoft.com/en-us/library/dd979786.aspx Lagged copies affect your storage design

    10. Deep Dive on Exchange 2010 High Availability Basics Quorum Witness DAG Lifecycle DAG Networks

    11. Quorum

    12. Quorum Used to ensure that only one subset of members is functioning at one time A majority of members must be active and have communications with each other Represents a shared view of members (voters and some resources) Dual Usage Data shared between the voters representing configuration, etc. Number of voters required for the solution to stay running (majority); quorum is a consensus of voters When a majority of voters can communicate with each other, the cluster has quorum When a majority of voters cannot communicate with each other, the cluster does not have quorum

    13. Quorum Quorum is not only necessary for cluster functions, but it is also necessary for DAG functions In order for a DAG member to mount and activate databases, it must participate in quorum Exchange 2010 uses only two of the four available cluster quorum models Node Majority (DAGs with an odd number of members) Node and File Share Majority (DAGs with an even number of members) Quorum = (N/2) + 1 (whole numbers only) 6 members: (6/2) + 1 = 4 votes for quorum (can lose 3 voters) 9 members: (9/2) + 1 = 5 votes for quorum (can lose 4 voters) 13 members: (13/2) + 1 = 7 votes for quorum (can lose 6 voters) 15 members: (15/2) + 1 = 8 votes for quorum (can lose 7 voters) N = number of nodes in clusterN = number of nodes in cluster

    14. Witness and Witness Server

    15. Witness A witness is a share on a server that is external to the DAG that participates in quorum by providing a weighted vote for the DAG member that has a lock on the witness.log file Used only by DAGs that have an even number of members Witness server does not maintain a full copy of quorum data and is not a member of the DAG or cluster

    16. Witness Represented by File Share Witness resource File share witness cluster resource, directory, and share automatically created and removed as needed Uses Cluster IsAlive check for availability If witness is not available, cluster core resources are failed and moved to another DAG member If other DAG member does not bring witness resource online, the resource will remain in a Failed state, with restart attempts every 60 minutes See http://support.microsoft.com/kb/978790 for details on this behavior

    17. Witness If not online and needed for quorum, cluster will try to online File Share Witness resource once If witness cannot be restarted, it is considered failed and quorum is lost If witness can be restarted, it is considered successful and quorum is maintained An SMB lock is placed on witness.log Node PAXOS information is incremented and the updated PAXOS tag is written to witness.log

    18. Witness When witness is no longer needed to maintain quorum, lock on witness.log is released Any member that locks the witness, retains the weighted vote (“locking node”) Members in contact with locking node are in majority and maintain quorum Members not in contact with locking node are in minority and lose quorum

    19. Witness Server No pre-configuration typically necessary Exchange Trusted Subsystem must be member of local Administrators group on Witness Server if Witness Server is not running Exchange 2010 Cannot be a member of the DAG (present or future) Must be in the same Active Directory forest as DAG

    20. Witness Server Can be Windows Server 2003 or later File and Printer Sharing for Microsoft Networks must be enabled Replicating witness directory/share with DFS not supported Not necessary to cluster Witness Server If you do cluster witness server, you must use Windows 2008 Single witness server can be used for multiple DAGs Each DAG requires its own unique Witness Directory/Share

    21. Database Availability Group Lifecycle

    22. Database Availability Group Lifecycle Create a DAG New-DatabaseAvailabilityGroup -Name DAG1 –WitnessServer EXHUB1 -WitnessDirectory C:\DAG1FSW -DatabaseAvailabilityGroupIpAddresses 10.0.0.8 New-DatabaseAvailabilityGroup -Name DAG2 -DatabaseAvailabilityGroupIpAddresses 10.0.0.8,192.168.0.8 Add Mailbox Servers to DAG Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX1 Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX2 Add a Mailbox Database Copy Add-MailboxDatabaseCopy -Identity DB1 -MailboxServer EXMBX2

    23. Database Availability Group Lifecycle DAG is created initially as empty object in Active Directory Continuous replication or 3rd party replication using Third Party Replication mode Once changed to Third Party Replication mode, the DAG cannot be changed back DAG is given a unique name and configured for IP addresses (or configured to use DHCP)

    24. Database Availability Group Lifecycle When the first Mailbox server is added to a DAG A failover cluster is formed with the name of DAG using Node Majority quorum The server is added to the DAG object in Active Directory A cluster name object (CNO) for the DAG is created in default Computers container using the security context of the Replication service The Name and IP address of the DAG is registered in DNS The cluster database for the DAG is updated with info about local databases

    25. Database Availability Group Lifecycle When second and subsequent Mailbox server is added to a DAG The server is joined to cluster for the DAG The quorum model is automatically adjusted The server is added to the DAG object in Active Directory The cluster database for the DAG is updated with info about local databases

    26. Database Availability Group Lifecycle After servers have been added to a DAG Configure the DAG Network encryption Network compression Replication port Configure DAG networks Network subnets Collapse DAG networks in single network with multiple subnets Enable/disable MAPI traffic/replication Block network heartbeat cross-talk (Server1\MAPI !<-> Server2\Repl)

    27. Database Availability Group Lifecycle After servers have been added to a DAG Configure DAG member properties Automatic database mount dial BestAvailability, GoodAvailability, Lossless, custom value Database copy automatic activation policy Blocked, IntrasiteOnly, Unrestricted Maximum active databases Create mailbox database copies Seeding is performed automatically, but you have options Monitor health and status of database copies and perform switchovers as needed

    28. Database Availability Group Lifecycle Before you can remove a server from a DAG, you must first remove all replicated databases from the server When a server is removed from a DAG: The server is evicted from the cluster The cluster quorum is adjusted The server is removed from the DAG object in Active Directory Before you can remove a DAG, you must first remove all servers from the DAG

    29. DAG Networks

    30. DAG Networks A DAG network is a collection of subnets All DAGs must have: Exactly one MAPI network MAPI network connects DAG members to network resources (Active Directory, other Exchange servers, etc.) Zero or more Replication networks Separate network on separate subnet(s) Used for/by continuous replication only LRU determines which replication network to use when multiple replication networks are configured

    31. DAG Networks Initially created DAG networks based on enumeration of cluster networks Cluster enumeration based on subnet One cluster network is created for each subnet

    32. DAG Networks

    33. DAG Networks

    34. DAG Networks To collapse subnets into two DAG networks and disable replication for the MAPI network:

    35. DAG Networks To collapse subnets into two DAG networks and disable replication for the MAPI network:

    36. DAG Networks Automatic network detection occurs only when members added to DAG If networks are added after member is added, you must perform discovery Set-DatabaseAvailabilityGroup -DiscoverNetworks DAG network configuration persisted in cluster registry HKLM\Cluster\Exchange\DAG Network DAG networks include built-in encryption and compression Encryption: Kerberos SSP EncryptMessage/DecryptMessage APIs Compression: Microsoft XPRESS, based on LZ77 algorithm DAGs use a single TCP port for replication and seeding Default is TCP port 64327 If you change the port and you use Windows Firewall, you must manually change firewall rules MSIT sees 30% compression, but percentage will vary based on message profileMSIT sees 30% compression, but percentage will vary based on message profile

    37. Active Manager Best Copy Selection Datacenter Activation Coordination Mode Deeper Dive on Exchange 2010 High Availability Advanced Features

    38. Active Manager

    39. Active Manager Exchange component that manages *overs Runs on every server in the DAG Selects best available copy on failovers Is the definitive source of information on where a database is active Stores this information in cluster database Provides this information to other Exchange components (e.g., RPC Client Access and Hub Transport)

    40. Active Manager Active Manager roles Standalone Active Manager Primary Active Manager (PAM) Standby Active Manager (SAM) Active Manager client runs on CAS and Hub

    41. Active Manager Transition of role state logged into Microsoft-Exchange-HighAvailability/Operational event log (Crimson Channel)

    42. Active Manager Primary Active Manager (PAM) Runs on the node that owns the cluster core resources (cluster group) Gets topology change notifications Reacts to server failures Selects the best database copy on *overs Detects failures of local Information Store and local databases

    43. Active Manager Standby Active Manager (SAM) Runs on every other node in the DAG Detects failures of local Information Store and local databases Reacts to failures by asking PAM to initiate a failover Responds to queries from CAS/Hub about which server hosts the active copy Both roles are necessary for automatic recovery If the Replication service is stopped, automatic recovery will not happen

    44. Best Copy Selection

    45. Best Copy Selection Process of finding the best copy to activate for an individual database given a list of status results of potential copies for activation Active Manager selects the “best” copy to become the new active copy when the existing active copy fails

    46. Best Copy Selection – RTM Sorts copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessary Selects from sorted listed based on which set of criteria met by each copy Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy

    47. Best Copy Selection – SP1 Sorts copies by activation preference when auto database mount dial is set to Lossless Otherwise, sorts copies based on copy queue length, with activation preference used a secondary sorting key if necessary Selects from sorted listed based on which set of criteria met by each copy Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy This was checked into build 213.This was checked into build 213.

    48. Best Copy Selection Is database mountable? Is copy queue length <= AutoDatabaseMountDial? If Yes, database is marked as current active and mount request is issued If not, next best database tried (if one is available) During best copy selection, any servers that are unreachable or “activation blocked” are ignored

    49. Best Copy Selection

    50. Best Copy Selection – RTM Four copies of DB1 DB1 currently active on Server1

    51. Best Copy Selection – RTM Sort list of available copies based by Copy Queue Length (using Activation Preference as secondary sort key if necessary): Server3\DB1 Server2\DB1 Server4\DB1

    52. Best Copy Selection – RTM Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy): Server3\DB1 Server2\DB1 Server4\DB1 Highlight criteriaHighlight criteria

    53. Best Copy Selection – SP1 Four copies of DB1 DB1 currently active on Server1 Auto database mount dial set to Lossless

    54. Best Copy Selection – SP1 Sort list of available copies based by Activation Preference: Server2\DB1 Server3\DB1 Server4\DB1

    55. Best Copy Selection – SP1 Sort list of available copies based by Activation Preference: Server2\DB1 Server3\DB1 Server4\DB1 Highlight criteriaHighlight criteria

    56. Best Copy Selection After Active Manager determines the best copy to activate The Replication service on the target server attempts to copy missing log files from the source (ACLL) If successful, then the database will mount with zero data loss If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting If data loss is outside of dial setting, next copy will be tried

    57. Best Copy Selection After Active Manager determines the best copy to activate The mounted database will generate new log files (using the same log generation sequence) Transport Dumpster requests will be initiated for the mounted database to recover lost messages When original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed

    58. Datacenter Activation Coordination Mode

    59. Datacenter Activation Coordination Mode DAC mode is a property of a DAG Acts as an application-level form of quorum Designed to prevent multiple copies of same database mounting on different members due to loss of network

    60. Datacenter Activation Coordination Mode RTM: DAC Mode is only for DAGs with three or more members that are extended to two Active Directory sites Don’t enable for two-member DAGs where each member is in different AD site or DAGs where all members are in the same AD site DAC Mode also enables use of Site Resilience tasks Stop-DatabaseAvailabilityGroup Restore-DatabaseAvailabilityGroup Start-DatabaseAvailabilityGroup SP1: DAC Mode can be enabled for all DAGs

    61. Datacenter Activation Coordination Mode Uses Datacenter Activation Coordination Protocol (DACP), which is a bit in memory set to either: 0 = can’t mount 1 = can mount

    62. Datacenter Activation Coordination Mode Active Manager startup sequence DACP is set to 0 DAG member communicates with other DAG members it can reach to determine the current value for their DACP bits If the starting DAG member can communicate with all other members, DACP bit switches to 1 If other DACP bits are set to 0, starting DAG member DACP bit remains at 0 If another DACP bit is set to 1, starting DAG member DACP bit switches to 1

    63. Replication and Copy Management enhancements in SP1 Improvements in Service Pack 1

    64. Improvements in Service Pack 1 Continuous replication changes Enhanced to reduce data loss Eliminates log drive as single point of failure Automatically switches between modes: File mode (original, log file shipping) Block mode (enhanced log block shipping) Switching process: Initial mode is file mode Block mode triggered when target needs Exx.log file (e.g., copy queue length = 0) All healthy passives processed in parallel File mode triggered when block mode falls too far behind (e.g., copy queue length > 0)

    65. Improvements in Service Pack 1

    66. Improvements in Service Pack 1 SP1 introduces RedistributeActiveDatabases.ps1 script (keep database copies balanced across DAG members) Moves databases to the most preferred copy If cross-site, tries to balance between sites Targetless admin switchover altered for stronger activation preference affinity First pass of best copy selection sorted by activation preference; not copy queue length This basically trades off even distribution of copies for a longer activation time. So you might pick a copy with more logs to play, but it will provide you with better distribution of databases

    67. Improvements in Service Pack 1 *over Performance Improvements In RTM, a *over immediately terminated replay on copy that was becoming active, and mount operation did necessary log recovery In SP1, a *over drives database to clean shutdown by playing all logs on passive copy, and no recovery required on new active

    68. Improvements in Service Pack 1 DAG Maintenance Scripts StartDAGServerMaintenance.ps1 It runs Suspend-MailboxDatabaseCopy for each database copy hosted on the DAG member It pauses the node in the cluster, which prevents it from being and becoming the PAM It sets the DatabaseCopyAutoActivationPolicy parameter on the DAG member to Blocked It moves all active databases currently hosted on the DAG member to other DAG members If the DAG member currently owns the default cluster group, it moves the default cluster group (and therefore the PAM role) to another DAG member

    69. Improvements in Service Pack 1 DAG Maintenance Scripts StopDAGServerMaintenance.ps1 It run Resume-MailboxDatabaseCopy for each database copy hosted on the DAG member It resumes the node in the cluster, which it enables full cluster functionality for the DAG member It sets the DatabaseCopyAutoActivationPolicy parameter on the DAG member to Unrestricted

    70. Improvements in Service Pack 1 CollectOverMetrics.ps1 and CollectReplicationMetrics.ps1 rewritten

    71. Improvements in Service Pack 1 Exchange Management Console enhancements in SP1 Manage DAG IP addresses Manage witness server/directory and alternate witness server/directory

    72. Question & Answer Session

    73. Related Content UNC311 | Communications Server "14": Architecture and planning for High Availability UNC201 | Introduction to Exchange Server 2010 SP1 UNC305 | Exchange 2010 Storage Design

    75. Resources

More Related