Validating a Cluster Configuration – Managing Data in a Hybrid Network

Configuring a failover cluster in Windows Server 2022 is much simpler than in previous versions of Windows Server. Before a cluster can be configured, run the Validate A Configuration Wizard to verify that your hardware is configured in a fashion that is supportable. Before you can run the Validate A Configuration Wizard, however, the Failover Clustering feature needs to be installed using Server Manager. The account that is used to create a cluster must have administrative rights on each of the cluster nodes and have permission to create a cluster name object in Active Directory. Follow these steps:

  1. Ensure that you meet the hardware and software perquisites.
  2. Install the Failover Clustering feature on each server.
  3. Log in with the appropriate user ID and run the Validate A Configuration Wizard.
  4. Create a cluster.
  5. Install and cluster applications and services.

To install the Failover Clustering feature on a cluster node, follow the steps outlined in Exercise 13.2.

EXERCISE 13.2

Installing the Failover Cluster Feature
  1. Press the Windows Key and select Administrative Tools Server Manager.
  2. Select number 2, Add Roles And Features.
  3. At the Select Installation Type screen, choose a role- based or feature- based installation.
  4. At the Select Destination Server screen, choose Select A Server From The Server Pool and click Next.
  5. At the Select Server Roles screen, click Next.
  6. At the Select Features screen, click the Failover Clustering (see Figure 13.13) check box. If the Add Features dialog box appears, click the Add Features button. Click Next.

FIGURE 13.13 Failover Cluster feature

7. At the confirmation screen (see Figure 13.14), click the Install button.

FIGURE 13.14 Confirmation screen

8. Once the installation is complete, click Close.

9. Close Server Manager.

Using the Validate A Configuration Wizard before creating a cluster is highly recommended. This wizard validates that the hardware and the software for the potential cluster nodes are in a supported configuration. Even if the configuration passes the tests, take care to review all warnings and informational messages so that they can be addressed or documented before you create the cluster.

Running the Validate A Configuration Wizard does the following:

       Conducts four types of tests (software and hardware inventory, network, storage, and system configuration)

       Confirms that the hardware and software settings are supportable by Microsoft support staff

You should run the Validate A Configuration Wizard before creating a cluster or after making any major hardware or software changes to the cluster. Doing this will help you identify any misconfigurations that could cause problems with the failover cluster.

Workgroup and Multidomain Clusters – Managing Data in a Hybrid Network

One nice new advantage of using Windows Server 2022 is the ability to set up a cluster on systems not part of the same domain. Windows Server 2022 allows you to set up a cluster without using Active Directory dependencies. You can create clusters in the following situations:

Single- Domain Cluster All nodes in a cluster are part of the same domain.

Multidomain Cluster Nodes in a cluster are part of a different domain.

Workgroup Cluster Nodes are member servers and part of a workgroup.

Site- Aware, Stretched, or Geographically Dispersed

Clusters (Geoclustering)

One nice advantage of Windows Server 2022 clustering is that you can set up site- aware failover clusters. Using site- aware clustering, you can expand clustered nodes to different geographic locations (sites). Site- aware failover clusters allow you to set up clusters in remote locations for failover, placement policies, Cross-S ite Heartbeating, and quorum placement.

One of the issues with previous clusters was the heartbeat. The cluster heartbeat is a signal sent between servers so that they know the machines are up and running. Servers send heartbeats, and if after five nonresponsive heartbeats, the cluster assumes that the node was offline. So, if you had nodes in remote locations, the heartbeats would not get the response they needed.

But now Windows Server 2022 includes Cross- Site Heartbeating, which allows you to set up delays so that remote nodes can answer the heartbeat in time. Use the following two PowerShell commands to specify the delay necessary for Cross- Site Heartbeating:

(Get- Cluster).CrossSiteDelay = <value>

(Get- Cluster).CrossSiteThreshold = <value>

The first PowerShell command (CrossSiteDelay) is what is used to set the amount of time between each heartbeat sent to nodes. This value is done in milliseconds (the default is 1000).

The second PowerShell command (CrossSiteThreshold) is the value that you set for the number of missed heartbeats (the default is 20) before the node is considered offline.

One issue you may face is if you have multiple sites or if the cluster is geographically dispersed. If the failover cluster does not have a shared common disk, data replication between nodes might not pass the cluster validation “storage” tests.

Setting up a cluster in a site- aware, stretched, or geocluster (these terms can be used interchangeably) configuration is a common practice. As long as the cluster solution does not require external storage to fail over, it will not need to pass the storage test to function properly.

Cluster Quorum

When a group of people set out to accomplish a single task or goal, a method for settling disagreements and for making decisions is required. In the case of a cluster, the goal is to provide a highly available service in spite of failures. When a problem occurs and a cluster node loses communication with the other nodes because of a network error, the functioning nodes are supposed to try to bring the redundant service back online.

How, though, is it determined which node should bring the clustered service back online?

If all the nodes are functional despite the network communications issue, each one might try. Just like a group of people with their own ideas, a method must be put in place to determine which idea, or node, to grant control of the cluster. Windows Server 2022 failover clustering, like other clustering technologies, requires that a quorum exist between the cluster nodes before a cluster becomes available.

A quorum is a consensus of the status of each of the nodes in the cluster. Quorum must be achieved in order for a clustered application to come online by obtaining a majority of the votes available (see Figure 13.12). Windows Server 2022 has four models, or methods, for determining quorum and for adjusting the number and types of votes available:

         Node majority (no witness)

          Node majority with witness (disk or file share)

         Node and file share majority

         No majority (disk witness only)

FIGURE 13.12 Majority needed

When a majority of the nodes are communicating, the cluster is functional.

When a majority of the nodes are not communicating, the cluster stops.

Witness Configuration

Most administrators follow some basic rules. For example, when you configure a quorum, the voting components in the cluster should be an odd number. For example, if I set up a quorum for five elements and I lose one element, I continue to work. If I lose two elements, I continue to work. If I lose three elements, the cluster stops—a s soon as it hits half plus 1, the cluster stops. This works well with an odd number.

If the cluster contains an even number of voting elements, you should then configure a disk witness or a file share witness. The advantage of using a witness (disk or file share) is that the cluster will continue to run even if half of the cluster nodes simultaneously go down or are disconnected. Configuring a disk witness is possible only if the storage vendor supports read- write access from all sites to the replicated storage.

One of the advantages of Windows Server 2022 is the advanced quorum configuration option. This option allows you to assign or remove quorum votes on a per-n ode basis. You now have the ability to remove votes from nodes in certain configurations. For example, if your organization uses a site- aware cluster, you may choose to remove votes from the nodes in the backup site. This way, those backup nodes would not affect your quorum calculations.

There are different ways that you can set up quorum witnesses. Here are some of the options that you can choose from:

Configuring a Disk Witness Choose the quorum disk witness if all nodes can see the disks. To set up this disk witness, the cluster must be able to see the dedicated LUN. The LUN needs to store a copy of the cluster database, and it’s most useful for clusters that are using shared storage. The following list is just some of the requirements when setting up a disk witness:

The LUN needs to be at least 512 MB minimum.

The disk must be dedicated to cluster use only.

Must pass disk storage validation tests.

The disk can’t be used as a Cluster Shared Volume (CSV).

You must use a single volume for Basic disks.

No drive letter is needed.

The drive must be formatted using NTFS or ReFS.

Can be used with hardware RAID.

Should not be used with antivirus or backup software.

Configuring a File Share Witness You should use the file share witness when you need to think about multisite disaster recovery and the file server must be using the SMB file share.

The following list is just some of the requirements when setting up a file share witness:

Minimum of 5 MB of free space.

File share must be dedicated to the cluster and not used to store user data or application data.

Configuring a Cloud Witness The Windows Server 2022 cloud witness is a new type of failover cluster quorum witness that leverages Azure as the intercession point. The cloud witness gets a vote just like any other quorum witness. You can set up the cloud witness as a quorum witness using the Configure A Cluster Quorum Wizard.

Dynamic Quorum Management

Windows Server 2022 provides dynamic quorum management, which automatically manages the vote assignment to nodes. With this feature enabled, votes are automatically added or removed from nodes when that node either joins or leaves a cluster. In Windows Server 2022, dynamic quorum management is enabled by default.

Achieving High Availability with Clustering – Managing Data in a Hybrid Network

Taking high availability to the next level for enterprise services often means creating a cluster. Normally you have two types of clusters: failover clusters or high- performance clusters.

In a failover cluster, all of the clustered application or service resources are assigned to one node or server in the cluster. If that node goes offline for any reason, the cluster continues to operate on a second node. When setting up a failover cluster, you can set the cluster up to be a straight failover cluster or a load- balancing cluster. Both options normally use two nodes. Normal failover clusters have a primary server (active server) and a backup server (passive server). If the primary server goes offline, the secondary server takes over. On a load- balancing failover cluster, both nodes handle requests from clients. If one of the nodes goes offline, the other node picks up the slack from the offline node.

High availability clusters normally have multiple nodes all connected to the same cluster. Because all of the clustered nodes work at the same time, you are getting the best performance along with the data protection of a cluster. The disadvantage of this type of cluster is that it normally requires more nodes as part of the cluster.

Commonly clustered applications are SQL Server and Exchange Server; commonly clustered services are File and Print. Since the differences between a clustered application and a clustered service are primarily related to the number of functions or features, for simplicity’s sake I will refer to both as clustered applications. Most often, clustered resources are a Hyper- V virtual machine in Azure or on an onsite domain.

If there is a failure of a clustered node or if the clustered node is taken offline for maintenance, the clustered application can continue to run on other cluster nodes. The client requests are automatically redirected to the next available cluster node to minimize the impact of the down clustered node.

How does clustering improve availability? By increasing the number of server nodes available on which the application or virtual machine can run, you can move the application or virtual machine to a healthy server if there is a problem, if maintenance needs to be completed on the hardware or the operating system, or if patches need to be applied. The clustered application that’s moved will have to restart on the new server regardless of whether the move was intentional. This is why the term highly available is used instead of fault tolerant.

Virtual machines, however, can be moved from one node to another using live migration. Live migration is where one or more virtual machines are intentionally moved from one node to another with their current memory state intact through the cluster network with no indicators to the virtual machine consumer that the virtual machine has moved from one server to another. However, in the event of a cluster node or virtual machine failure, the virtual machine will still fail and will then be brought online again on another healthy cluster node. Figure 13.10 shows an example of SQL Server running on the first node of a Windows Server 2022 failover cluster.

FIGURE 13.10 Using failover clustering to cluster SQL Server

The clustered SQL Server in Figure 13.11 can be failed over to another node in the cluster and still service database requests. However, the database will be restarted.

FIGURE 13.11 Failing the SQL Server service to another node

Failover Clustering Requirements

The Failover Clustering feature is available in the Datacenter, Standard, and Hyper-V  editions of Windows Server 2022.

To be able to configure a failover cluster, you must have the required components.  A single failover cluster can have up to 64 nodes when using Windows Server 2022, and the clustered service or application must support that number of nodes.

Before creating a failover cluster, make sure that all the hardware involved meets the cluster requirements. To be supported by Microsoft, all hardware must be certified for Windows Server 2022, and the complete failover cluster solution must pass all tests in the Validate A Configuration Wizard. Although the exact hardware will depend on the clustered application, a few requirements are standard:

             Server components must be marked with the “Certified for Windows Server 2022” logo.

       Although not explicitly required, server hardware should match and contain the same or similar components.

          All of the Validate A Configuration Wizard tests must pass.

The requirements for failover clustering storage have changed from previous versions of Windows. For example, Parallel SCSI is no longer a supported storage technology for any of the clustered disks. There are, however, additional requirements that need to be met for the storage components:

         Disks available for the cluster must be Fibre Channel, iSCSI, or Serial Attached SCSI.

       Each cluster node must have a dedicated network interface card for iSCSI connectivity. The network interface card you use for iSCSI should not be used for network communication.

        Multipath software must be based on Microsoft’s Multipath I/O (MPIO).

       Storage drivers must be based on storport.sys.

       Drivers and firmware for the storage controllers on each server node in the cluster should be identical.

Storage components must be marked with the “Certified for Windows Server 2022” logo.

In addition, there are network requirements that must be met for failover clustering:

        Cluster nodes should be connected to multiple networks for communication redundancy.

       Network adapters should be the same make, use the same driver, and have the firmware version in each cluster node.

Network components must be marked with the “Certified for Windows Server 2022” logo.

There are two types of network connections in a failover cluster. These should have adequate redundancy because total failure of either could cause loss of functionality of the cluster. The two types are as follows:

Public Network This is the network through which clients are able to connect to the clustered service application.

Private Network This is the network used by the nodes to communicate with each other.

To provide redundancy for these two network types, you would need to add more network adapters to the node and configure them to connect to the networks.

In previous versions of Windows Server, support was given only when the entire cluster configuration was tested and listed on the Hardware Compatibility List. The tested configuration listed the server and storage configuration down to the firmware and driver versions. This proved to be difficult and expensive from both a vendor and a consumer perspective to deploy supported Windows clusters.

When problems did arise and Microsoft support was needed, it caused undue troubleshooting complexity as well. With Windows Server 2022 failover clustering and simplified requirements, including the “Certified for Windows Server 2022” logo program and the Validate A Configuration Wizard, it all but eliminates the guesswork of getting the cluster components configured in a way that follows best practices and allows Microsoft support to assist you easily when needed.

Configure CredSSP or Kerberos authentication – Managing Data in a Hybrid Network

When you choose to use live migrations, one of the settings you configure is the type of authentication you can use. Choosing the authentication type is a feature listed under the Advanced Features of live migration. You can choose two types of authentication (as shown in Figure 13.8): Kerberos or Credential Security Support Provider (CredSSP).

FIGURE 13.8 Advanced Features for a live migration

Authentication is choosing which protocol you will use to guarantee that live migration traffic between the source and destination servers is verified. Let’s take a look at both options:

Credential Security Support Provider (CredSSP) This option allows you to set up better security but requires constrained delegation for live migration. You have the ability to sign in to the source server by using a local console session, a Remote Desktop session, or a remote Windows PowerShell session.

Kerberos This option lets you avoid having to sign into the server but requires constrained delegation to be set up.

Another section that you configure in the Advanced Features is Performance. This section allows you to choose how the network traffic for live migrations will be configured. You can choose from three options:

TCP/IP The memory of the virtual machine being migrated is copied over the network to the destination server over a TCP/IP connection.

Compression The memory of the virtual machine being migrated is compressed and then copied over the network to the destination server over a TCP/IP connection.

SMB The memory of the virtual machine is copied over the network to the destination server over a SMB (Server Message Block) connection. SMB Direct will be used if the network adapters of both the source and destination server have Remote Direct Memory Access (RDMA) capabilities enabled.

Implementing Live Migration

You will need the following to set up nonclustered hosts for live migration:

       A user account in the local Hyper- V Administrators group or the Administrators group on both the source and destination computers. Membership in the Domain Administrators group.

       The Hyper- V role in Windows Server 2022 installed on both the source and destination servers. Live migration can be done if the virtual machine is at least version 5.

       The source and destination computers must belong to the same Active Directory domain or belong to trusted domains.

       The Hyper- V management tools installed on the server. The computer must be running Windows Server 2022 or Windows 10/11.

If you want to set up the source and destination of the live migration, use the following steps:

  1. Open Hyper- V Manager. (Click Start Administrative Tools Hyper- V Manager.)
  2. In the navigation pane, select one of the servers. Right- click the server and choose Hyper- V Settings Live Migrations.
  3. In the Live Migrations pane, select Enable Incoming And Outgoing Live Migrations.
  4. In the section Simultaneous Live Migrations, specify the number of simultaneous live migrations (the default is 2).
  5. Under Incoming Live Migrations, accept any network for live migrations or specify the IP address you want to use for live migration. If you want to use an IP address, click the Add button and type the IP address information. Click OK when you’re finished.
  6. For Kerberos and performance options, expand Live Migrations (click the plus sign next to Live Migrations) and then select Advanced Features:

    Under Authentication Protocol, select Use CredSSP or Use Kerberos.

Under Performance options, select performance configuration options (either TCP/ IP, Compression, or SMB).

7. Click OK.

8. If you have another server that you want to set up for live migrations, select the server and repeat the steps.

Implement Shared Nothing Live Migration

Administrators can now live- migrate virtual machines even if the Hyper-V  host is not part of a cluster. Before using live migrate without a Windows cluster, you have to configure the servers. Choose Kerberos or Credential Security Support Provider (CredSSP) to authenticate the live migration.

To trigger a Shared Nothing Live Migration remotely, you’ll need to enable Kerberos constrained delegation, which you configure on the Delegation tab of Active Directory Users and Computers for each computer taking part in the Shared Nothing Live Migration.

Implementing Storage Migration

Hyper- V supports moving virtual machine storage without downtime by allowing you to move storage while the virtual machine is running. You do this by using Hyper-V  Manager or Windows PowerShell. You can add storage to a Hyper- V cluster or a stand- alone computer, and then move VMs to the new storage while the virtual machines continue to run. You can move virtual machine storage between physical storage devices to respond to a decrease in performance that results from bottlenecks.

Storage Migration Requirements

To use the Hyper- V functionality of moving virtual machine storage, you must meet these prerequisites:

          One or more installations of Windows Server 2022 with the Hyper-V  role installed

        A server that is capable of running Hyper- V

           Virtual machines that are configured to use only virtual hard disks for storage

Storage Migration lets you move the virtual hard disks of a virtual machine while the virtual hard disks are still able to be used by the running virtual machine (see Figure 13.9).

When you move a running virtual machine’s virtual hard disks, Hyper- V performs the following steps:

  1. Disk reads and writes use the source virtual hard disk.
  2. When reads and writes occur on the source virtual hard disk, the disk data is copied to the new destination virtual hard disk.
  3. Once the initial disk copy is complete, the disk writes are mirrored to both the source and destination virtual hard disks while outstanding disk changes are replicated.
  4. After the source and destination virtual hard disks are entirely synchronized, the virtual machine changes over to using the destination virtual hard disk.
  5. The source virtual hard disk is deleted.

Virtual Machine Advanced Features – Managing Data in a Hybrid Network

One nice feature of virtual machines is the ability to set up advanced features. In the Advanced Features section (see Figure 13.7), there are multiple settings that you can configure.

MAC Addressing The first thing that you can configure in the Advanced Features section is setting a MAC address. The MAC address is a physical address that is associated to the NIC adapter. You can set the MAC address to Dynamic (it creates its own MAC addresses) or Static (this is where you can set a MAC address).

You also have the ability to do MAC spoofing. This is where a VM can change the source MAC address in outgoing packets to one that is not assigned to the NIC adapters.

DHCP Guard DHCP Guard drops DHCP server messages from unauthorized virtual machines pretending to be a DHCP server. So what does this mean to you? If a server tries to pretend to be a DHCP server, your virtual machine will drop any messages that are sent by that DHCP server.

Router Guard Router Guard drops router advertisement and redirection messages from unauthorized virtual machines pretending to be routers. It works almost the same way DHCP Guard works. If an unauthorized router tries to send messages to a virtual machine, that VM will not accept those messages.

Protected Network You can set Network Health Detection at the virtual machine level for a Hyper- V host cluster. This is configured as a Protected Network. When you select the Protected Network check box, the virtual machine will be moved to another cluster node if a network disconnection is detected. If the health of a network connection is showing as disconnected, the VM will be automatically moved.

Port Mirroring Port mirroring allows the network traffic of a virtual machine to be monitored by copying incoming and outgoing packets and forwarding the copies to another virtual machine configured for monitoring.

NIC Teaming NIC Teaming gives you the ability to allow multiple network adapters on a system to be placed into a team. You can establish NIC Teaming in the guest operating system to aggregate bandwidth and provide redundancy. This is useful if teaming is not configured in the management operating system.

Device Naming Device naming causes the name of the network adapter to be propagated into supported guest operating systems.

VM Checkpoints

One thing that you may want to set up on your Hyper-V  server is recovery points or checkpoints. A checkpoint is a snapshot in time from when you can recover a virtual machine. It’s like taking a picture of the virtual machine and using that picture to recover the VM. You can create multiple checkpoints of a VM and then recover back to any of those checkpoints if there is an issue. Using a more recent recovery point will result in less data lost. Checkpoints can be accessed from up to 24 hours ago.

If you want to enable these checkpoints in time for Hyper-V , you just need to follow these steps:

  1. In Hyper- V Manager, right- click the virtual machine and choose Settings.
  2. In the Management section, select Checkpoints.
  3. To enable checkpoints for a VM, select Enable Checkpoints. If you want to disable checkpoints, just clear the check box.
  4. Click Apply. Once you are finished, click OK and close Hyper- V Manager.
Software Load Balancing

Windows Server 2022 Hyper- V also allows you to distribute virtual network traffic using software load balancing (SLB). SLB allows you to have multiple servers hosting the same virtual networking workload in a multitenant environment. That way, you can set up high availability.

Using SLB allows you to load- balance virtual machines on the same Hyper-V  server. Let’s take a look at how SLB works. SLB is possible because it sets up a virtual IP address (VIP) that is automatically mapped to the dynamic IP addresses (DIP) of the virtual machines. The DIP addresses are the IP addresses of the virtual machines that are part of the load- balancing setup.

So, when someone tries to access the resources in the load- balancing setup, they access it by using the VIP address. The VIP request then gets sent to the DIP address of the virtual machines. So, users use the single VIP address, and that address gets sent to the load- balancing virtual machines.

Understanding Live Migration

Before we can implement live migration, you should understand what live migration does for Hyper- V. Hyper- V live migration transfers a running virtual machine from one physical server to another. The real nice advantage is that during the move of the virtual machine, there is no impact on the network’s users. The virtual machine will continue to operate even during the move. This is different from using Hyper-V  Quick Migration. Quick Migration required a pause in the Hyper- V VM while it’s being moved.

Live migration lets you move virtual machines between servers. This is very useful when a Hyper- V server starts having issues. For example, if a Hyper-V  machine is starting to have hardware issues, you can move the virtual machines from that Hyper-V  server to another server that is running properly.

When setting up VM migrations, you have a few options. You can live- migrate a VM, Quick Migrate a VM, or just move a VM. As stated before, live migration requires no interruption of the VM. Quick Migration requires that you first pause the VM, then save the VM, then move the VM, and finally restart the VM. Moving a virtual machine means that you are going to copy a VM from one Hyper- V server to another while the virtual machine is turned off.

So, if you decide to use live migrations, there are a few things you should understand before setting it up. Let’s take a look at some of the settings you can configure.

Azure Load Balancer – Managing Data in a Hybrid Network

Since we have been discussing network load balancing, I want to delve a bit deeper into Azure’s network load balancing tool called Azure Load Balancer. Azure Load Balancer has three different SKUs that you can choose from: Basic, Standard, and Gateway. Each is designed for specific scenarios and each has differences in scale, features, and pricing.

Azure Load Balancer operates at Layer 4 of the Open Systems Interconnection (OSI) model and distributes inbound flows that enter at the load balancer’s front-e nd to backend pool instances and supports both inbound and outbound scenarios. As with some other Azure tools, there is a cost associated with using Azure Load Balancer. For more information on pricing, check out Microsoft’s website at https://azure.microsoft.com/en- us/ pricing/details/load- balancer/#purchase- options.

With Azure Load Balancer you can create either a public (external) load balancer or an internal (private) load balancer. A public load balancer can provide outbound connections for VMs inside your virtual network and are used to load- balance Internet traffic to the VMs. These connections work by converting their private IP addresses to public IP addresses. An internal (or private) load balancer can route traffic from the public to resources within your network and are used to load- balance traffic inside a virtual network. It can be accessed only from private resources that are internal to the network.

Azure Load Balancer works across virtual machines, virtual machine scale sets, and IP addresses. There are three SKUs that you can choose from:

Standard Load Balancer Designed for load- balancing network layer traffic when high performance and super- low latency are required. It routes traffic within and across regions, and to availability zones for high resiliency.

Basic Load Balancer Designed for small- scale applications that do not need high- availability or redundancy. Not compatible with availability zones.

Gateway Load Balancer Designed to help deploy, scale, and manage third- party virtual appliances. Provides one gateway for distributing traffic across multiple virtual appliances. You can scale them up or down, depending on demand.

For step- by- step instructions on how to create a public (external) load balancer using the Azure portal, check out Microsoft’s website at https://learn.microsoft.com/en- us/ azure/load- balancer/quickstart- load- balancer- standard- public- portal.

Configure a Floating IP Address for the Cluster

Some application scenarios may require or suggest that the same port be used by several applications on a single VM in the backend pool. Some examples of common port reuse are clustering for high availability and network virtual appliances. You will need to enable Floating IP in the rule definition if you want to reuse the backend port across multiple rules. When it’s enabled, Azure will change the IP address mapping to the front-e nd IP address of the load balancer front end instead of the backend’s IP address, which allows for greater flexibility.

You can configure a Floating IP on a Load Balancer rule by using a number of tools such as the Azure portal, REST API, CLI, or PowerShell. You must also configure the virtual machine’s Guest OS in order to use a Floating IP. To work properly, the Guest OS for the VM must be configured to receive all traffic bound for the front- end IP and port of the load balancer.

Achieving High Availability with Hyper- V

One of the nice advantages of using Hyper- V is the ability to run an operating server within another server. Virtualization allows you to run multiple servers on top of a single Hyper- V server. But we need to make sure that these servers stay up and running.

That is where Hyper- V high availability comes into play. Ensuring that your Hyper-V  servers are going to continue to run even if there is a hardware issue is an important step in guaranteeing the success of your network. There are many ways to achieve that. One is to set up clustering and another is to set up Hyper- V high availability without clustering. Setting up reliability without clustering requires that your Hyper-V  servers have replica copies that can automatically start up if the virtual machine errors out. This is referred to as live migration and replica servers.

Implementing a Hyper- V Replica

Hyper- V Replica is an important part of the Hyper-V  role. It replicates the Hyper-V virtual  machines from the primary site to the replica secondary sites simultaneously.

Once you enable Hyper- V Replica for a particular virtual machine on the primary Hyper V host server, the Hyper- V replica will begin to create an exact copy of the virtual machine for the secondary site. After this replication, Hyper-V  Replica creates a log file for the virtual machine VHDs. This log file is rerun in reverse order to the replica VHD. This is done using replication frequency. The log files and reverse order helps ensure that the latest changes are stored and copied asynchronously. If there is an issue with the replication frequency, you will receive an alert.

On the virtual machine, you can establish resynchronization settings. You can do this manually, automatically, or automatically on an explicit schedule. To fix constant synchronization issues, you may choose to set up automatic resynchronization.

Hyper- V Replica will aid in a disaster recovery strategy by replicating virtual machines from one host to other while keeping workloads accessible. Hyper- V Replica can create a copy of a running virtual machine to a replica offline virtual machine.

Hyper- V Hosts

With replication over a WAN link, the primary and secondary host servers can be located in the same physical location or at different geographical locations. Hyper- V hosts can be stand- alone, clustered, or a combination of both. Hyper- V hosts are not dependent on Active Directory, and there is no need to be domain members.

Replication and Change Tracking

When you enable Hyper-V  Replica on a virtual machine, an identical copy of that VM is created on a secondary host server. Once this happens, the Hyper-V  Replica will create a log file that will track changes made on a virtual machine VHD. The log file is rerun in reverse order to the replica VHD. This is based on the replication frequency settings, and it ensures that the latest changes are created and replicated asynchronously. This can be done over HTTP or HTTPS.

Extended (Chained) Replication

Extended (Chained) Replication allows you to replicate a virtual machine from a primary host to a secondary host and then replicate the secondary host to a third host. It is not possible to replicate from the primary host directly to the second and third hosts.

Extended (Chained) Replication aids in disaster recovery in that you can recover from both the primary and extended replica. Extended Replication will also help if the primary and secondary locations go offline. It must be noted that the extended replica does not support application- consistent replication and it must use the same VHD that the secondary replica uses.

Setting the Affinity

NLB allows you to configure three types of affinity settings to help response times between NLB clients. Each affinity setting determines a method of distributing NLB client requests. There are three different affinity settings:

No Affinity (None) If you set the affinity to No Affinity (None), NLB will not assign a NLB client with any specific member. When a request is sent to the NLB, the requests are balanced among all the nodes. No Affinity provides greater performance, but there may be issues with clients establishing sessions. This happens because the request may be load- balanced between NLB nodes and session information may not be present.

Single Affinity Setting the cluster affinity to Single (this is the default setting) will send all traffic from a specific IP address to a single cluster node. This will keep a client on a specific node where the client should not have to authenticate again. Setting the affinity mode to Single would remove the authentication problem but would not distribute the load to other servers unless the initial server was down. Setting the affinity to Single allows a client’s IP address to always connect to the same NLB node. This setting allows clients using an intranet to get the best performance.

Class C Affinity When setting the affinity to Class C, NLB links clients with a specific member based on the Class C part of the client’s IP address. This allows you to set up NLB so that clients from the same Class C address range can access the same NLB member. This affinity is best for NLB clusters using the Internet.

Failover

If the primary or the secondary (extended) host server locations goes offline, you can manually initiate failover. Failover is not automatic. There are several different types of manually initiating failover:

Test Failover Use Test Failover to verify that the replica virtual machine can successfully start in the secondary site. It will create a copy test virtual machine during failover and does not affect standard replication. After the test failover, if you select Failover on the replica test virtual machine, the test failover will be deleted.

Planned Failover Use Planned Failover during scheduled downtime. You will have to turn off the primary machine before performing a planned failover. Once the machine fails over, the Hyper- V Replica will start replicating changes back to the primary server. The changes are tracked and sent to ensure that no data is lost. Once the planned failover is complete, the reverse replication begins so that the primary virtual machine become the secondary, and vice versa. This ensures that the hosts are synchronized.

Unplanned Failover Use Unplanned Failover during unforeseen outages. Unplanned failover is started on the replica virtual machine. This should only be used if the primary machine goes offline. A check will confirm whether the primary machine is running. If you have recovery history enabled, then it is possible to recover to an earlier point in time. During failover, you should ensure that the recovery point is acceptable and then finish the failover to ensure that recovery points are combined.

Upgrading an NLB Cluster – Managing Data in a Hybrid Network

Upgrading an NLB cluster is a fairly straightforward process. The first thing that you have to do is stop the NLB cluster. There are two ways to do so:

       Use the stop command to stop the cluster immediately. This also means that any current connections to the NLB cluster are killed.

       Use the drainstop command. The cluster stops after answering all of the current NLB connections. So the current NLB connections are finished but no new connections to that node are accepted.

So, to do your upgrade, you should execute a stop or drainstop on the NLB cluster node that you want to upgrade or to remove existing connections to the application on the local host. After the NLB cluster is stopped, you then perform an in-p lace upgrade in a rolling manner.

If you want to stop the entire cluster from running, while in the NLB manager (type NLBmgr in the Run command), right-c lick the cluster, point to Control Hosts, and then choose Stop.

If you want to stop a single node in the cluster from running, while in the NLB manager (type NLBmgr in the Run command), right- click the node, point to Control Hosts, and then choose Stop.

PowerShell Commands for a NLB Cluster

Table 13.2 shows some of the PowerShell commands that you can use to manage the NLB cluster.

TABLE 13.2 PowerShell commands for NLB

Add- NlbClusterNodeThis command adds a new node to the NLB cluster.
Add- NlbClusterNodeDipThis command adds a dedicated IP address to a cluster.
Add- NlbClusterPortRuleThis command adds a new port rule to a cluster.
Add- NlbClusterVipThis command adds a virtual IP address to a cluster.
Disable- NlbClusterPortRuleThis command disables a port rule on a Network Load Balancing (NLB) cluster.
Enable- NlbClusterPortRuleThis command enables a port rule on a cluster.
PowerShell commandDescription
Get- NlbClusterThis command allows you to view information about the Network Load Balancing (NLB) cluster.
Get- NlbClusterDriverInfoThis command allows you to see information about the NLB drivers on a machine.
Get- NlbClusterNodeThis command gets the information about the cluster object.
Get- NlbClusterPortRuleThis command gets the port rule objects.
New- NlbClusterThis command creates a cluster on the specified interface.
New- NlbClusterIpv6AddressThis command generates IPv6 addresses to create cluster virtual IP addresses.
Remove- NlbClusterThis command deletes a cluster.
Remove- NlbClusterNodeThis command removes a node from a cluster.
Remove- NlbClusterPortRuleThis command deletes a port rule from a cluster.
Resume- NlbClusterThis command resumes all nodes in the cluster.
Set- NlbClusterThis command allows you to edit the configuration of an NLB cluster.
Set- NlbClusterNodeThis command allows you to edit the NLB cluster node settings.
Set- NlbClusterPortRuleThis command allows you to edit the NLB port rules.
Start- NlbClusterThis command will start all of the nodes in a cluster.
Start- NlbClusterNodeThis command will start one of the nodes in a cluster.
Stop- NlbClusterThis command stops all nodes in the cluster.
Stop- NlbClusterNodeThis command will stop one of the nodes in a cluster.
Load Balancing with Azure

If you are using Azure for your network, then Azure has a number of tools that will help you with load balancing as well. As of this writing, Azure has the following tools available for load balancing:

Azure Traffic Manager ADNS- based traffic load balancer that will spread traffic to services across global Azure regions by using DNS- based traffic routing methods. It prioritizes user access, helps to make sure that data sovereignty is adhered to, and for app upgrades and maintenance can adjust traffic. Azure Traffic Manager supports HTTP, HTTPS, HTTP/2, TCP, UDP, Layer 7, and global apps.

Azure Load Balancer A network- layer load balancer that improves network performance and availability of your applications by using low- latency Layer 4 load balancing capabilities. Azure Load Balancer can balance traffic between virtual machines inside your virtual networks and across multitiered hybrid apps. It supports TCP, UDP, Layer 4, and global/regional apps.

Azure Application Gateway An application delivery controller as a service that turns web front- ends into highly available apps by using Layer 7 load balancing capabilities by securely distributing regional apps. It supports HTTP, HTTPS, HTTP/2, Layer 7, regional apps, web application firewall, and SSL/TLS offloading.

Azure Front Door Microsoft’s cloud content delivery network (CDN) that safeguards the delivery of global apps by delivering real- time performance by using the Microsoft global edge network. The Microsoft global edge network is one of the biggest backbone networks in the world. Azure Front Door provides access between your apps’ static and dynamic web content and your users around the world. It supports HTTP, HTTPS, HTTP/2, Layer 7, global apps, web application firewall, and SSL/TLS offloading.

Azure also has a service selection tool that can help you choose the best Azure cloud load- balancing service for your needs by answering a few questions regarding your app,

workloads, and performance requirements. To access the tool, log into your Azure portal at

https://portal.azure.com/#blade/Microsoft_Azure_Network/ LoadBalancingHubMenuBlade/overview and answer a few questions.

Installing NLB Nodes – Managing Data in a Hybrid Network

You can install NLB nodes like any other server build. You can install NLB by using either Server Manager or the Windows PowerShell commands for NLB.

First make sure that all NLB servers have the most current updates, provisioned with appropriate resources (typically with multiple network interface cards for capacity and responsiveness), and monitored for health and reliability. In Exercise 13.1, I will walk you through the installation of your NLB nodes.

EXERCISE 13.1

Installing NLB Nodes
  1. Once you have multiple hosts ready for the installation of NLB, run the Add Roles And Features Wizard and select Network Load Balancing in the Features area of the wizard.

If the Add Features dialog box appears, click Add Features.

2. Click Next. At the Confirmation screen, click the Install button. After the installation is finished, click the Close button and then close Server Manager.

3. Check that the wizard has placed the Network Load Balancing Manager in your Start menu under Windows Administrative Tools (see Figure 13.1).

FIGURE 13.1 Network Load Balancing

4. Right- click Network Load Balancing Clusters and select New Cluster (see Figure 13.2).

EXERCISE 13.1 (continued)

5. You are then presented with the New Cluster: Connect Wizard, where you can specify the name of one of your hosts. Type the name of one of your cluster nodes and click Connect (see Figure 13.3). After the connection is made, the TCP/IP address will be shown. Click Next.

FIGURE 13.3 Hostname setup

6. If you see a DHCP dialog box, disable DHCP on this adapter. Click OK.

7. The next page reveals a prompt to add any additional IPs and assign a priority level. You can do all this later, so click Next. If you see a No Dedicated IP Addresses dialog box, click Yes.

8. The next wizard page is where you specify the cluster IP address. This is the address that the endpoints or clients or users of the NLB cluster will contact. Typically, the network team will assign a cluster IP address for this use (see Figure 13.4). Click OK, then click Next.

FIGURE 13.4 Add IP Address

9. On the next page, configure the Cluster operation mode (see Figure 13.5) and specify a Full Internet Name.

EXERCISE 13.1 (continued)

FIGURE 13.5 Cluster parameters

With regard to the cluster operation modes, the differences between them are as follows:

Unicast

The cluster adapters for all nodes are assigned the same MAC address.

The outgoing MAC address for each packet is modified based on priority to prevent upstream switches from discovering that all nodes have the same MAC address.

Communication between cluster nodes (other than heartbeat and other administrative NLB traffic) is not possible unless there are additional adapters (because all nodes have the same MAC address).

Depending on load, this configuration can cause switch flooding since all inbound packets are sent to all ports on the switch.

Multicast

The cluster adapters for all nodes are assigned their own MAC unicast address.

The cluster adapters for all nodes are assigned a multicast MAC address (derived from the IP of the cluster).

Non- NLB network traffic between cluster nodes works fine since they all have their own MAC address.

IGMP Multicast

This is much like multicast, but the MAC traffic goes only to the switch ports of the NLB cluster, preventing switch flooding.

10. After you select the appropriate settings, the next page is where port rules (see Figure 13.6) are configured. By default, it is set up to be wide open. Most implementations will limit NLB ports to just the ports needed for the application. For example, a web server would need port 80 enabled. It is also in this area where you can configure filtering mode.

    FIGURE 13.6 Port Rules

    The affinity sets a client’s preference to a particular NLB host. It is not recommended to set affinity to None when UDP is an expected traffic type.

    11. Click the Finish button. Close the NLB Manager.

      Understanding Network Load Balancing – Managing Data in a Hybrid Network

      This section discusses onsite network load balancing (NLB). Performing NLB using Azure will be discussed later in this chapter. So, the first thing we have to discuss is why you would choose to use NLB. NLB lets you configure two or more servers as a single virtual cluster. It’s designed for high availability and scalability of Internet server applications. This means that Windows Server 2022 NLB is designed to work with web servers, FTP servers, firewalls, proxy servers, and virtual private networks (VPNs).

      You can use NLB for other mission- critical servers, but you can also use failover clusters on many of these servers. So, after reading this and the next chapter (“Hybrid Data and Servers”), hopefully you will be able to choose the appropriate high availability server setup for your network and applications.

      NLB is a form of clustering where the nodes are highly available for a network- based service. This is typically a port listener configuration where a farm of, say, Microsoft Internet Information Services servers all listen on ports 80 and 443 for incoming web traffic from client endpoints. These nodes, while not fully clustered in a technical sense, are load balanced, where each node handles some of the distributed network traffic.

      The NLB feature uses the TCP/IP networking protocol to distribute traffic. For web and other necessary servers, NLB can provide performance and consistency when two or more computers are combined into a single virtual cluster.

      Hosts are servers that make up an NLB cluster. Each host runs its own individual copy of the server applications. The incoming client requests are distributed by NLB to each of the hosts in the cluster. You can configure the load so that it is handled by each host. Hosts can be added to the cluster to increase the load. If NLB has all traffic directed to a specific single host, then it is called a default host.

      With the use of NLB, all the computers in a cluster can use the same set of IP addresses while each host maintains its own exclusive IP address. When a host fails for load- balanced applications, the computers still in operation will receive the workload automatically. When the down computer is ready to rejoin the cluster, it comes back online and will regain its share of the workload. This allows the rest of the computers in the cluster to handle less traffic.

      NLB is beneficial in that stateless applications (e.g., web servers) and are available with little downtime, and it allows for scalability. Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged in order to accommodate growth. Scalability, when used for NLB clusters, is the ability to add one or more systems to an existing cluster when the need arises. You can do the following with NLB to support scalability:

               A single cluster can support up to 32 computers.

                 Handle multiple server load requests from across multiple hosts in a cluster.

               For single TCP/IP services, balance-l oad requests across the NLB cluster.

                 As the workload grows, you can add hosts to the NLB cluster without failure.

                When the workload declines, you can remove hosts from the cluster.

                  Allow higher performance and lower overhead by using a pipelined implementation.

      Pipelining allows requests to be sent to the NLB cluster without waiting for a response.

             Use NLB Manager or Windows PowerShell cmdlets to manage and configure NLB clusters and hosts from a single computer.

             Determine port rules for each website. Port rules allow you to configure which ports are going to be enabled or disabled. Ports are doorways that applications can use to access resources. For example, DNS traffic uses port 53 for all DNS traffic. Here are some of the more common port numbers:

      FTP uses ports 20/21.

      Secure Shell uses port 22.

      SMTP (mail) uses port 25.

      DNS uses port 53.

      HTTP uses port 80.

      POPv3 uses port 110.

      HTTPS uses port 443.

      Determine load balancing behavior using port management rules for an IP port or group of ports.

             Use an optional, single- host rule that will direct all client requests to a single host. NLB will route client requests to a specific host that is running particular applications.

                Allow certain IP ports to block unwanted network access.

             When operating in multicast mode, enable Internet Group Management Protocol (IGMP) support on the cluster host. This will control switch port flooding (when all incoming network packets are sent to all ports on the switch).

            Use Windows PowerShell to start, stop, and control NLB actions remotely.

             Check NLB events using Windows Event Log. All NLB actions and cluster changes are logged in the Event Log.

      NLB Requirements

      The following are NLB cluster hardware requirements:

           All hosts must be on the same subnet.

             For each host, there is no limitation to the number of network adapters.

             All network adapters must be multicast or unicast within the cluster. Mixed environments, within a single cluster, are not supported.

             If using unicast mode, the network adapter used to handle client- to- cluster traffic must support media access control (MAC) address changing.

      NLB cluster software requirements are as follows:

            The adapter on which NLB is enabled can only support TCP/IP.

           Must have a static IP address on the servers in the cluster.

      Components of High Availability – Managing Data in a Hybrid Network

      High availability is a buzzword that many application and hardware vendors like to throw around to get you to purchase their products. Many different options are available to achieve high availability, and there also seems to be a number of definitions and variations that help vendors sell their products as high availability solutions.

      When it comes right down to it, however, high availability simply means providing services with maximum uptime by avoiding unplanned downtime. Often, disaster recovery (DR) is also closely lumped into discussions of high availability, but DR encompasses the business and technical processes used to recover once a disaster has happened.

      Defining a high availability plan usually starts with a service level agreement (SLA). At its most basic, an SLA defines the services and metrics that must be met for the availability and performance of an application or service. Often, an SLA is created for an IT department or service provider to deliver a specific level of service. An example of this might be an SLA for a Microsoft Exchange server. The SLA for an Exchange server might have uptime metrics on how much time during the month the mailboxes need to be available to end users, or it might define performance metrics for the amount of time it takes for email messages to be delivered.

      When determining what goes into an SLA, two other factors need to be considered. However, you will often see them discussed only in the context of disaster recovery, even though they are important for designing a highly available solution. These factors are the recovery point objective (RPO) and the recovery time objective (RTO).

      An RTO is the length of time an application can be unavailable before service must be restored to meet the SLA. For example, a single component failure would have an RTO of less than five minutes, and a full- site failure might have an RTO of three hours. An RPO is essentially the amount of data that must be restored in the event of a failure. For example,

      in a single server or component failure, the RPO would be 0, but in a site failure, the RPO might allow for up to 20 minutes of lost data.

      SLAs, on the other hand, are usually expressed in percentages of the time the application is available. These percentages are also often referred to by the number of nines the percentage includes. So, if someone told you that you need to make sure that the router has a rating of five 9s, that would mean that the router could only be down for 5.26 minutes a year. Table 13.1 shows you some of the different nines rating and what each rating allows for downtime.

      TABLE 13.1 Availability percentages

      Availability ratingAllowed unplanned downtime/year
      99 (two nines) percent3.65 days
      99.9 (three nines) percent8.76 hours
      99.99 (four nines) percent52.56 minutes
      99.999 (five nines) percent5.26 minutes
      99.9999 (six nines) percent31.5 seconds
      99.99999 (seven nines) percent3.15 seconds

      Two important factors that affect an SLA are the mean time between failure (MTBF) and the mean time to recovery (MTTR). To be able to reduce the amount of unplanned downtime, the time between failures must be increased, and the time it takes to recover must be reduced. Modifying these two factors will be addressed in the next several sections of this chapter.

      Achieving High Availability

      Windows Server 2022 is the most secure and reliable Windows version to date. It also is the most stable, mature, and capable of any version of Windows. Although similar claims have been made for previous versions of Windows Server, you can rest assured that Windows Server 2022 is much better than previous versions for a variety of reasons.

      An honest look at the feature set and real- world use should prove that this latest version of Windows provides the most suitable foundation for creating a highly available solution. However, more than just good software is needed to be able to offer high availability for applications.

      Achieving High Availability

      In today’s technology world, there are many ways to set up and manage a high availability network. Since the AZ- 800 and AZ- 801 exams cover both onsite servers and Azure, we will talk about setting up high availability using these two methods. Many third- party companies offer high availability solutions, but we will focus on onsite and Azure setups.

      High Availability Foundation

      Just as a house needs a good foundation, a highly available Windows server needs a stable and reliable hardware platform on which to run. Although Windows Server 2022 will technically run on desktop- class hardware, high availability is more easily achieved with server- class hardware. What differentiates desktop- class from server- class hardware? Server- class hardware has more management and monitoring features built into it so that the health of the hardware can be monitored and maintained.

      Another big difference is that server- class hardware has redundancy options. Server- class hardware often has options to protect from drive failures, such as RAID controllers, and to protect against power supply failures, such as multiple power supplies. Enterprise- class servers have even more protection.

      More needs to be done than just installing Windows Server 2022 to ensure that the applications remain running with the best availability possible. Just as a house needs maintenance and upkeep to keep the structure in proper repair, so too does a server. In the case of a highly available server, this means patch management.

      Installing Patches

      Microsoft releases monthly updates to fix security problems with its software, both for operating system fixes and for applications. To ensure that your highly available applications are immune to known vulnerabilities, these patches need to be applied in a timely manner during a scheduled maintenance window. Also, to address stability and performance issues, updates and service packs are released regularly for many applications, such as Microsoft SQL Server, Exchange Server, and SharePoint Portal Server. Many companies have a set schedule— daily, weekly, or monthly—t o apply these patches and updates after they are tested and approved.

      Desired Configuration Manager (DCM), an option in Microsoft Configuration Manager, is a great tool for helping to validate that your cluster nodes are patched. It can leverage the SCCM client to collect installed patches and help reporting within the enterprise on compliancy with desired system states based on the software installed.

      To continue with the house analogy, if you were planning to have the master bath remodeled, would you rather hire a college student on spring break looking to make some extra money to do the job or a seasoned artisan? Of course, you would want someone with experience and a proven record of accomplishment to remodel your master bath.

      Likewise, with any work that needs to be done on your highly available applications, it’s best to hire only decidedly qualified individuals. This is why obtaining a Microsoft certification is definitely an excellent start to becoming qualified to configure a highly  available server properly. There is no substitute for real-l ife and hands- on experience. 

      Working with highly available configurations in a lab and in production will help you know not only what configurations are available but also how the changes should be made.

      For example, it may be possible to use failover clustering for a DNS server, but in practice DNS replication may be easier to support and require less expensive hardware in order to provide high availability. This is something you would know only if you had enough experience to make this decision.

      As with your house, once you have a firm and stable foundation built by skilled artisans and a maintenance plan has been put into place, you need to ascertain what more is needed. If you can’t achieve enough uptime with proper server configuration and mature operational processes, a cluster may be needed.

      Windows Server 2022 provides two types of high availability: failover clustering and network load balancing (NLB). Failover clustering is used for applications and services such as SQL Server and Exchange Server. Network load balancing is used for network- based services such as web and FTP servers.