Exchange 2003:Standby Cluster Recovery Timothy J. McMichael Microsoft Corporation email@example.com
What is an Exchange Standby Cluster? • A cluster that mirrors a production cluster. • Hardware / Software configurations • Exchange and OS versions • Software updates and hotfixes • Has Exchange installed on it but is not configured with Exchange Virtual Servers. • Can only be used when the production Exchange cluster is no longer online or available.
What allows for an Exchange 2003 standby cluster? • Changes were made to Exchange 2003 cluster setup to fully recognize an existing EVS. • System Attendant deletion in Exchange 2003 does not result in removing the server object from active directory. • In Exchange 2000 deleting the system attendant resource was equivalent to removing the Exchange server from active directory. • Exchange 2003 implements the “Remove Exchange Virtual Server” selection. When used while removing the system attendant this removes the Exchange server from active directory.
What allows for an Exchange 2003 standby cluster. Event Type: Error Event Source: MSExchangeCluster Event Category: Services Event ID: 1027 Date: 9/13/2006 Time: 9:38:45 AM User: N/A Computer: MAIL-2-A Description: Exchange System Attendant: System Attendant resource has been deleted improperly. The objects in Active Directory corresponding to this Exchange Virtual Server have not been removed. If you unintentionally deleted the System Attendant resource, you can restore it and its dependent resources to their original configuration by re-creating the System Attendant resource using the same parameters as before. If you want to remove the Exchange Virtual Server corresponding to this System Attendant resource, you have to re-create the System Attendant resource with the same parameters as before and then select the option "Remove Exchange Virtual Server" in the context menu of the group or System Attendant resource using the Cluster Administrator tool. For more information, click http://www.microsoft.com/contentredirect.asp. Data: 0000: 00 00 00 00 ....
Why use an Exchange 2003 Standby Cluster? • Recover from an entire loss of an Exchange cluster. • Provide site resilience for Exchange clustering without implementing geo-clustering.
Why use an Exchange 2003 Standby Cluster? • Complications of geo-clustering: • Sometimes expensive. • Multiple hardware requirements. • Need for synchronous data replication. • Stretched VLAN, the same subnet must exist in two different physical locations.
Can I recover my cluster like a stand-alone server? • Stand-alone servers are recovered using the setup /disasterrecovery switch. • The setup /disasterrecovery switch will not run on a clustered EVS. The component "Microsoft Exchange" cannot be assigned the action "Disaster Recovery" because: - Microsoft Exchange setup does not support the use of the DisasterRecovery action when running on cluster nodes. See the Microsoft Exchange Disaster Recovery white paper at http://www.microsoft.com/exchange for the proper methodology to use for recovering a Microsoft Exchange cluster.
What is required to implement a standby cluster? • Exchange 2003 / Windows 2003 • This process was only tested for implementation in native mode organizations. • Exchange 5.5 and Exchange 2000 clusters along with Exchange 2003 clusters on Windows 2000 cannot implement standby clusters. • Supporting software (ie SAN connectivity, backup, etc).
What is required to implement a standby cluster? • Hardware • Recommendations for hardware depend on the current production cluster implementation and the standby cluster implementation. • Hardware should be comparable to production servers to provide the same level of service in the event of a failure. • Hardware should be listed under the cluster solutions category of the Windows Server Catalog.
What is required to implement a standby cluster? • It is preferred that the public interface of the standby cluster reside on the same subnet as the production cluster. • Standby clusters can only host EVS from one production cluster. • Exchange 2003 should be preinstalled on the standby nodes.
What is required to implement a standby cluster? • IP Addresses / Names • Cluster IP and name may not conflict with any other IP / name on the network. • Node IP and name may not conflict with any other IP / name on the network. • A cluster service account name and password. • This needs to be the same as the production cluster.
What is required to implement a standby cluster? • Physical disk resources configuration must use the same drive letters in standby as exist on the production cluster. • The physical disk resources must not have any other Exchange data already on them.
What can be recovered to a standby cluster? • The standby cluster without any EVS created can be used to recover any production cluster. • Once the first EVS is created subsequent EVS must be created from the same production cluster. • If multiple production EVS exist they must all be moved to the standby cluster if one requires movement.
How many recovery nodes will I need? • This depends on your current production implementation. • If you are using an Active / Passive cluster in production with one EVS: • You can use two nodes in an active / passive configuration for standby. • You can use a single node configuration for standby.
How many recovery nodes will I need? • If you have multiple EVS in the production cluster (Active / Active / Active / Passive): • One node will be required for each EVS. • In this instance three nodes would be required. • If using the minimal number of nodes no failover in the standby cluster can occur (only one EVS from Exchange can be active on a node at a time).
What are the considerations when using an alternate site / subnet? • If using an alternate active directory site or subnet some pre-failure work must be done. • The active directory site should be created in sites and services with an appropriate subnet defined. • Domain controllers and DNS servers should be installed in the site and fully replicated.
What are the considerations when using an alternate site / subnet? • When the Exchange Virtual Server is created on the standby cluster, protocol resources will be bound to their original IP addresses. • If not corrected this will cause the protocols to fail is-alive checks. • To change use the Exchange system manager and select the new IP address.
What are the considerations when using an alternate site / subnet? ERR Microsoft Exchange SMTP Server Instance <SMTP Virtual Server Instance 1 (MAIL-2)>: [EXRES] DwCheckProtocolSocket: failed to connect socket. Error 10060. ERR Microsoft Exchange SMTP Server Instance <SMTP Virtual Server Instance 1 (MAIL-2)>: [EXRES] ExchangeCheckIsAlive: IsAlive failed, will retry in 50 msec.
What are the considerations when using an alternate site / subnet? • Client Access / Server to Server Communication • Clients / Servers must have access to a DNS server that is replicating and has received the IP address change. • Client / Server side DNS resolver cache will have to expire or be flushed to query the new IP. (ipconfig /flushdns) • Connectivity between client / server sites and standby sites must be available.
How are permissions handled when using standby clustering? • Permissions to the Exchange configuration container are granted by: • Machine accounts of cluster nodes having full control of the EVS object. • Machine accounts of the cluster nodes being members of the Exchange Domain Servers group. • The machine account created for the EVS name is not used for permissions in this case.
How are permissions handled when using standby clustering? • If the same node names will be used in standby as production, the machine accounts should be “reset” in active directory prior to joining the new nodes to the domain. • If different node names will be used they will be added with appropriate permissions as part of the Exchange System Attendant creation.
Installing the standby cluster • Identify the hardware and storage resources necessary for the cluster implementation. • Make any necessary domain changes and allow time for replication. • Ensure the network card binding order is correct. • Install standby nodes into the same domain as production nodes.
Installing the standby cluster • Install and configure the cluster services. • Assign an appropriate cluster management name. • Assign an appropriate cluster management IP address. • Select an appropriate quorum type. • Add additional nodes to the cluster as necessary.
Installing the standby cluster • Install all Exchange pre-requisites: • WWW service. • NNTP service. • SMTP service. • ASP.net • Create the Distributed Transaction Coordinator resource in the cluster group. • Install Exchange, service packs, and hotfixes.
Installing the standby cluster • In cluster administrator, create a group for the Exchange resources. • Ensure that physical disk resources exist in this group that match the same drive letters as the production cluster.
Using the standby cluster • Ensure that all production resources have been taken offline (and cannot be accidentally brought back online). • Consider setting the Exchange services on the production cluster to disabled in the services control panel. • Create a IP address resource in the Exchange cluster group. Use an appropriate IP address.
Using the standby cluster • Create a network name resource in the Exchange group. • Use the same network name as the production EVS. • Enable “DNS registration must succeed”. • Enable “Enable Kerberos Authentication”. • Create the ExchangeSystem Attendant resource.
Using the standby cluster • When creating the SA ensure the proper cluster dependencies are established. • SA dependent on network name resource. • SA dependent on all physical disk resources (including mount points) that Exchange data will reside on.
Using the standby cluster • SA setup will be different on the standby cluster: • The administrative / routing group selection will be disabled. • The initial Exchange data path selection will be disabled. • These values are read from the active directory as a part of this installation.
Using the standby cluster. • When the SA creation has completed all other Exchange resources will be created. • At this time the resources can be brought online and data recovered.
Moving back to production • The steps to move back to production depend on what state the production servers are in. • If the production servers were completely rebuilt then the steps are the same as moving to the standby cluster.
Moving back to production • If the production servers were not rebuilt, then special care must be taken with the Exchange Virtual Server Network Name resource. • The EVS Network Name resource must reset Kerberos prior to bringing the Exchange resources online.
Moving back to production • To reset the Kerberos machine account: • Select the properties of the network name and deselect “Enable Kerberos Authentication”. Apply / OK the change. • Select the properties of the network name resource and enable “Enable Kerberos Authentication”. • The next time the network name resource is brought online the machine account will be reset.
Moving back to production • You can also use a cluster command to make the Kerberos changes. Cluster res <Network Name> /priv RequireKerberos=0 Cluster res <Network Name> /priv RequireKerberos=1
Moving back to production • Once you are done with the standby cluster all resources / services should be disabled. • If the cluster will be used for future standby deployments, the operating systems and Exchange binaries should be reinstalled. • By reinstalling you provide a clean platform for the next standby cluster.
What about my data? • The standby recovery steps do not include specific information on how to get your data from production to standby. • Replication of online Exchange data is addressed in KB 895847. • The “dial tone” recovery strategy may be useful if the Exchange data is not readily available in the standby location.
What is a dial tone recovery? • Dial tone recovery is mounting blank databases on a server to provide access to users and mail flow without historical data. • Historical data is restored at another time. • There are several design considerations that should be accounted for when considering this recovery plan.
More information… http://www.microsoft.com/technet/prodtechnol/exchange/guides/DROpsGuide/2493c2d6-618c-4c49-9cb1-fff556926707.mspx?mfr=true