Achieving High Availability with Clustering – Managing Data in a Hybrid Network

Taking high availability to the next level for enterprise services often means creating a cluster. Normally you have two types of clusters: failover clusters or high- performance clusters.

In a failover cluster, all of the clustered application or service resources are assigned to one node or server in the cluster. If that node goes offline for any reason, the cluster continues to operate on a second node. When setting up a failover cluster, you can set the cluster up to be a straight failover cluster or a load- balancing cluster. Both options normally use two nodes. Normal failover clusters have a primary server (active server) and a backup server (passive server). If the primary server goes offline, the secondary server takes over. On a load- balancing failover cluster, both nodes handle requests from clients. If one of the nodes goes offline, the other node picks up the slack from the offline node.

High availability clusters normally have multiple nodes all connected to the same cluster. Because all of the clustered nodes work at the same time, you are getting the best performance along with the data protection of a cluster. The disadvantage of this type of cluster is that it normally requires more nodes as part of the cluster.

Commonly clustered applications are SQL Server and Exchange Server; commonly clustered services are File and Print. Since the differences between a clustered application and a clustered service are primarily related to the number of functions or features, for simplicity’s sake I will refer to both as clustered applications. Most often, clustered resources are a Hyper- V virtual machine in Azure or on an onsite domain.

If there is a failure of a clustered node or if the clustered node is taken offline for maintenance, the clustered application can continue to run on other cluster nodes. The client requests are automatically redirected to the next available cluster node to minimize the impact of the down clustered node.

How does clustering improve availability? By increasing the number of server nodes available on which the application or virtual machine can run, you can move the application or virtual machine to a healthy server if there is a problem, if maintenance needs to be completed on the hardware or the operating system, or if patches need to be applied. The clustered application that’s moved will have to restart on the new server regardless of whether the move was intentional. This is why the term highly available is used instead of fault tolerant.

Virtual machines, however, can be moved from one node to another using live migration. Live migration is where one or more virtual machines are intentionally moved from one node to another with their current memory state intact through the cluster network with no indicators to the virtual machine consumer that the virtual machine has moved from one server to another. However, in the event of a cluster node or virtual machine failure, the virtual machine will still fail and will then be brought online again on another healthy cluster node. Figure 13.10 shows an example of SQL Server running on the first node of a Windows Server 2022 failover cluster.

FIGURE 13.10 Using failover clustering to cluster SQL Server

The clustered SQL Server in Figure 13.11 can be failed over to another node in the cluster and still service database requests. However, the database will be restarted.

FIGURE 13.11 Failing the SQL Server service to another node

Failover Clustering Requirements

The Failover Clustering feature is available in the Datacenter, Standard, and Hyper-V  editions of Windows Server 2022.

To be able to configure a failover cluster, you must have the required components.  A single failover cluster can have up to 64 nodes when using Windows Server 2022, and the clustered service or application must support that number of nodes.

Before creating a failover cluster, make sure that all the hardware involved meets the cluster requirements. To be supported by Microsoft, all hardware must be certified for Windows Server 2022, and the complete failover cluster solution must pass all tests in the Validate A Configuration Wizard. Although the exact hardware will depend on the clustered application, a few requirements are standard:

             Server components must be marked with the “Certified for Windows Server 2022” logo.

       Although not explicitly required, server hardware should match and contain the same or similar components.

          All of the Validate A Configuration Wizard tests must pass.

The requirements for failover clustering storage have changed from previous versions of Windows. For example, Parallel SCSI is no longer a supported storage technology for any of the clustered disks. There are, however, additional requirements that need to be met for the storage components:

         Disks available for the cluster must be Fibre Channel, iSCSI, or Serial Attached SCSI.

       Each cluster node must have a dedicated network interface card for iSCSI connectivity. The network interface card you use for iSCSI should not be used for network communication.

        Multipath software must be based on Microsoft’s Multipath I/O (MPIO).

       Storage drivers must be based on storport.sys.

       Drivers and firmware for the storage controllers on each server node in the cluster should be identical.

Storage components must be marked with the “Certified for Windows Server 2022” logo.

In addition, there are network requirements that must be met for failover clustering:

        Cluster nodes should be connected to multiple networks for communication redundancy.

       Network adapters should be the same make, use the same driver, and have the firmware version in each cluster node.

Network components must be marked with the “Certified for Windows Server 2022” logo.

There are two types of network connections in a failover cluster. These should have adequate redundancy because total failure of either could cause loss of functionality of the cluster. The two types are as follows:

Public Network This is the network through which clients are able to connect to the clustered service application.

Private Network This is the network used by the nodes to communicate with each other.

To provide redundancy for these two network types, you would need to add more network adapters to the node and configure them to connect to the networks.

In previous versions of Windows Server, support was given only when the entire cluster configuration was tested and listed on the Hardware Compatibility List. The tested configuration listed the server and storage configuration down to the firmware and driver versions. This proved to be difficult and expensive from both a vendor and a consumer perspective to deploy supported Windows clusters.

When problems did arise and Microsoft support was needed, it caused undue troubleshooting complexity as well. With Windows Server 2022 failover clustering and simplified requirements, including the “Certified for Windows Server 2022” logo program and the Validate A Configuration Wizard, it all but eliminates the guesswork of getting the cluster components configured in a way that follows best practices and allows Microsoft support to assist you easily when needed.

Leave a Reply