Windows Server 2022 has options that let fine- tune the failover process to meet the needs of your business. I’ll cover these options in the next few sections.
Failover occurs when a clustered application or service moves from one node to another. The process can be triggered automatically because of a failure or server maintenance or can be done manually by an administrator. The failover process works like this:
- The cluster service takes all of the resources in the role offline in the order set in the dependency hierarchy.
- The cluster service transfers the role to the node that is listed next on the application’s list of preferred host nodes.
- The cluster service attempts to bring all of the role’s resources online, starting at the bottom of the dependency hierarchy.
In a cluster that is hosting multiple applications, it may be important to set specific nodes to be primarily responsible for each clustered application. This can be helpful from a troubleshooting perspective since a specific node is targeted for hosting service. To set a preferred node and an order of preference for failover, use the General tab in the Properties dialog box of the clustered application.
Also, the order of failover is set in this same dialog box by moving the order in which the nodes are listed. If NODEA should be the primary node and NODEC should be the server that the application fails to first, NODEA should be listed first and selected as the preferred owner. NODEC should be listed second, and the remaining cluster nodes should be listed after NODEC.
A number of failover settings can be configured for the clustered service. The failover settings control the number of times a clustered application can fail in a period of time before the cluster stops trying to restart it. Typically, if a clustered application fails a number of times, some sort of manual intervention will be required to return the application to a stable state.
Specifying the maximum number of failures will keep the application from trying to restart until it is manually brought back online after the problem has been resolved. This is beneficial because if the application continues to be brought online and then fails, it may show as being functional to the monitoring system, even though it continues to fail. After the application is put in a failed state, the monitoring system will not be able to contact the application and should report it as being offline.
Failback settings control whether and when a clustered application would fail back to the preferred cluster node once it becomes available. The default setting is Prevent Failback. If failback is allowed, two additional options are available, either to fail back immediately after the preferred node is available or to fail back within a specified time.
The time is specified in the 24- hour format. If you want to allow failback between 10
p.m. and 11 p.m., set the failback time to be between 22 and 23. Setting a failback time to off- hours is an excellent way to ensure that your clustered applications are running on the designated nodes and automatically scheduling the failover process for a time when it will impact the fewest users.
One tool that is valuable in determining how resources affect other resources is the dependency walker. The dependency walker visualizes the dependency hierarchy created for an application or service. Using this tool can help when you’re troubleshooting why specific resources are causing failures and allow you to visualize the current configuration better and adjust it to meet business needs. Exercise 13.6 will show you how to run the dependency viewer.
EXERCISE 13.6
Using the Dependency Viewer
- Open the Failover Cluster Management MMC.
- In the console tree, click the arrow to expand the cluster.
- Click Roles.
- Under the Roles section in the center of the screen, click one of the roles (such as Print1).
- Right- click the role and under More Actions click Show Dependency Report.
- Review the dependency report.
- Close the Dependency Report and close the Failover Cluster Manager.
Exercise 13.5 generated a dependency report that shows how the print service is dependent on a network name and a clustered disk resource. The network name is then dependent on an IP address.
Resource Properties
Resources are physical or logical objects, such as a file share or IP address, that the failover cluster manages. They may be a service or application available to clients, or they may be part of the cluster. Resources include physical hardware devices such as disks and logical items such as network names. They are the smallest configurable unit in a cluster and can run on only a single node in a cluster at a time.
Like clustered applications, resources have a number of properties available for meeting business requirements for high availability. This section covers resource dependencies and policies.
Dependencies can be set on individual resources and control how resources are brought online and offline. Simply put, a dependent resource is brought online after the resources that it depends on, and it is taken offline before those resources. As shown in Figure 13.20, dependencies can be set on a specific resource, such as the generic application.
FIGURE 13.20 Resource dependencies

Resource policies are settings that control how resources respond when a failure occurs and how resources are monitored for failures. Figure 13.21 shows the Policies tab of a resource’s Properties dialog box.
FIGURE 13.21 Resource policies

You set configuration options on the Policies tab for how a resource should respond in the event of a failure. The options available are as follows:
If Resource Fails, Do Not Restart This option, as it would lead you to believe, leaves the failed resource offline.
If Resource Fails, Attempt Restart On Current Node With this option set, the resource tries to restart if it fails on the node on which it is currently running. There are two additional options if this is selected so that the number of restarts can be limited. They set the number of times the resource should restart on the current node in a specified length of time. For example, if you specify 5 for Maximum Restarts In The Specified Period and 10:00 (mm:ss) for Period For Restarts, the cluster service will try to restart the resource five times during that 10- minute period. After the fifth restart, the cluster service will no longer attempt to restart the service on the active node.
If Restart Is Unsuccessful, Fail Over All Resources In This Service Or Application If this option is selected, when the cluster service is no longer trying to restart the resource on the active node, it will fail the entire service or application to another cluster node.
If you wanted to leave the application or service with a failed resource on the current node, you would clear this check box.
If All The Restart Attempts Fail, Begin Restarting Again After The Specified Period (hh:mm) If this option is selected, the cluster service will restart the resource at a specified interval if all previous attempts have failed.
Pending Timeout This option is used to set the amount of time in minutes and seconds that the cluster service should wait for this resource to respond to a change in states. If a resource takes longer than the cluster expects to change states, the cluster will mark it as having failed. If a resource consistently takes longer than this and the problem cannot be resolved, you may need to increase this value. Figure 13.22 shows the Advanced Policies tab.
FIGURE 13.22 Resource Advanced Policies

The options available on the Advanced Policies tab are as follows:
Possible Owners This option allows you to remove specific cluster nodes from running this resource. Using this option is valuable when there are issues with a resource on a particular node and you want to keep the applications from failing over to that node until the problem can be repaired.
Basic Resource Health Check Interval This option allows you to customize the health check interval for this resource.
Thorough Resource Health Check Interval This option allows you to customize the thorough health check interval for this resource.
Run This Resource In A Separate Resource Monitor If the resource needs to be debugged by a support engineer or if the resource conflicts with other resources, you may select this option.