Setting Up a Hyper-V Cluster

If you’ve ever worked with Hyper-V you’ll recognize it’s a fairly simple and straight forward virtualization platform, however things can get a bit sticky when it comes to clustering. I have worked with well deployed Hyper-V clusters and also dealt with those that were setup incorrectly. As the popularity of Hyper-V grows I wanted to create a quick overview of what it takes to deploy a Hyper-V failover cluster.

 

Join to the Domain

Once you have completed your windows installation join each server to the domain using the PowerShell example below where <domain\user> is your domain user account ex: richsitblog\rstaats and where <domainname> is your domain name ex: richsitblog.com

Add-Computer -DomainCredential domain\user -DomainName domainname
Restart-Computer

 

Installing Roles

Once you server has completed reboot you will want to install roles across all the servers in the cluster you are standing up. To do this substitute vmh## in the example below with your host names.

Invoke-Command -ComputerName vmh01,vmh02,vmh03,vmh04 -ScriptBlock {Install-WindowsFeature Hyper-V, Multipath-IO, Failover-Clustering -IncludeManagementTools -IncludeAllSubFeature}
Invoke-Command -ComputerName vmh01,vmh02,vmh03,vmh04 -ScriptBlock {Restart-Computer}

 

Configuring NIC Teaming (Server)

You will need to ensure you have OOB/LOM or direct console access to the machine as these changes will cause interruption in network service

 

Assuming you have named your NICs for the front end network NIC1 and NIC2 use the following PowerShell statement to build your team, note if you are not using LACP you can instead use a different teaming mode such as “Switch Independent” or “Static Teaming”. Additional LoadBalancingAlgorithm options include “Dynamic” and “Address Hash”

Invoke-Command -ComputerName vmh01,vmh02,vmh03,vmh04 -ScriptBlock {New-NetLbfoTeam -Name Team1 -TeamMembers NIC1,NIC2 -TeamingMode LACP -LoadBalancingAlgorithm HyperVPorts}

Once your team is successfully built you will need to configure the IP addressing information for the team

 

Configuring NIC Teaming (Switch)

In this example I am assuming a LACP channel group is already setup on the switch side. I am also assuming you are utilizing a Cisco IOS or NXOS device. I am using the following example VLANS (100 – Prod, 200 – Dev, 300 – Test). Please check with your network admin and ensure you have all correct info before making any changes. For the sake of simplicity for this example I am assuming the port channels are 101-104 for these VM Hosts:

conf term
int po 101-104
switchport mode trunk
switchport trunk native vlan 100
switchport trunk allowed vlans 100,200,300
end
copy run start

If using 2n network architecture you will need to perform this actions on both sides of the switch pair.

 

Configuring vSwitch

Next we will configure a vSwitch using our newly created NIC Team and allow it to share the adapter with the management OS

 

New-VMSwitch -Name "vSwitch" -NetAdapterName Team1 -AllowManagementOS $true

 

Configuring Storage Network (Server)

Best practice when using iSCSI for shared Storage is to setup your interfaces as individual discreetly IP’d interfaces to allow for maximum queues and throughput. Configure your adapters accordingly for your storage network and ensure that you have have connectivity and if you are utilizing jumbo frames that they are enabled on the NICs within the OS, on the switch, and on the storage appliance.

 

Configuring Storage Network (Switch)

Assuming again you are using Cisco networking gear and have these interfaces setup as individual ports we can set these to an access VLAN. For the sake of simplicity my example will use ports Eth 101-108 (accounting for 2 ports per server) and assumes they are on a single storage VLAN (in this example VLAN 400 – Storage)

conf term
int Eth101-108
switchport access vlan 400
end
copy run start

 

Configuring MPIO Settings

Before we begin building the cluster and adding LUNs lets go ahead and get the MPIO configurations out of the way. Again we will kill 4 (or more) birds with one stone using the Invoke-Command to execute across multiple systems:

Invoke-Command -ComputerName vmh01,vmh02,vmh03,vmh04 -ScriptBlock {Enable-MSDSMAutomaticClaim -BusType iSCSI}
Invoke-Command -ComputerName vmh01,vmh02,vmh03,vmh04 -ScriptBlock {Set-MSDSMGlobalLoadBalancePolicy -Policy LQD}

 

Choices for the MSDSM Global Loadbalance Policy include:

Name
Cmdlet
Lease Queue Depth LQD
Round Robin RR
Fail Over Only FOO
Least Blocks LB

Connecting an ISCSI LUN:

Open Server manager and click “tools” on the right hand side and choose iSCSI initiator

If this is the first time you have opened the iSCSI initiator you will see the following prompt, click yes.

1

CHAP Auth

The entry of initiator and target secrets only applies if your storage connection requires CHAP or mutual CHAP auth

If you are using mutual CHAP authentication you will first need to configure the initiator CHAP secret, this can be accomplished by clicking the configuration tab of the iSCSI Initiator properties, and choosing the CHAP button where you can enter in the initiator secret.

2

Once this is completed click Apply, then choose the Discovery tab and click the Discover Portal Button

3

When prompted click the advanced button

4

click the advanced button and tick the Enable Multi-path option and click OK.

Once You have done this enter the target portal IP and check the box for “Enable CHAP log on” and enter the information as needed. If you are performing mutual CHAP auth check the box to “perform mutual authentication”.

5

Select the local adapter as the default Microsoft ISCSI

For initiator IP choose the IP of your first storage interface in the dropdown

For target portal IP select the IP of the storage device.

Connecting to Targets

If you have not already done so click the Discovery tab and click the discover portal button and enter the IP address of the iSCSI target:

6

Click OK, then move to the Targets tab where you will see all available iSCSI targets listed, select the target and click connect (note if you are using mutual CHAP auth you may be prompted again for target secret to connect)

7

Once you have established a connection and see the LUN as connected, hihgligh the connected LUN and click properties. From Here you will see the following screen:

8

Click the add session button and go through the same process as above, however this time you will select your second storage NIC’s IP as the Initiator IP.

 

Standing Up the Failover Cluster

Open the Failover Cluster Manager on one of the hosts. Inside the MMC right click failover cluster manager and choose create cluster

9

You should see a wizard pop up on your screen, click next, on the “select servers” screen enter the hostnames of your servers as shown in the example below:

10

On the Validation Screen choose to run cluster validation. Once this has completed (this can take anywhere from 10-60 minutes in my experience depending on the amount of resources being validated. Next enter a cluster name when prompted. On the confirmation screen go ahead and proceed with the box checked to add eligible storage. Once the cluster is built if it is not given an IP through DHCP you will be prompted to create one. This can be changed after the fact as well. At this point we can proceed to add disks to the cluster and configure quorum.

Adding Disks

From a single host in the cluster open disk management and bring your iscsi cluster disks online by right clicking them and choosing online

11

Once you have done this you will need to initialize the disk

12

Right click the raw disk and write an NTFS partition to it. Please note all disks should be using GPT for partition table type when prompted.

13

At this point we can add the disks to the cluster. Open Failover cluster manager, expand the cluster name and expand the storage node and click on disks. From hear click add disk:

14

Once you click add disks all cluster eligible storage will appear, leave them checked and click OK

15

Once this is done right click each disk (except the disk you have allocated for quorum and choose to add to cluster shared volumes

16

At this point on each of the cluster nodes you should be able to see the cluster volumes under C:\ClusterStorage\Volume#

If a disk fails to become a cluster shared volume ensure you have written a partition to it.

 

Setting Cluster Quorum

Cluster quorum can be set to use node majority, disk witness, or file share witness. I personally prefer node majority with a disk witness. To setup quorum options right click the cluster, choose more actions, then configure cluster quorum.

17

Choose the following “Select the Quorum Witness”

18

click next then choose “Configure Disk Witness”

19

At this point we can select our disk for cluster disk witness, and continue to proceed through the end of the wizard:

20

Setting Live Migration Settings

On the left hand side of the failover cluster manager console choose networks. Then on the right hand side select live migration settings:

21

In the window that appears you can uncheck networks you don’t want to be used for live migrations and then prioritize the networks you wish to use.

 

Final Cluster Validation

At this point we are clear to run a final cluster validation. Right click the cluster name and choose validate cluster, then run through the full cluster validation wizard. This will take a substantial amount of time to run. Pay close attention to any errors or warnings that occur.

 

Adding Virtual Machines to Cluster

Right click “Roles” underneath the cluster name and select configure role. Then choose virtual machine and click next, at this point you will get a list of all VMs that are eligible to be added to the cluster, check the boxes accordingly and proceed through the wizard to add VMs.

22

Please make sure your VMs are storage migrated to cluster storage prior to adding them to the failover cluster.

Citrix XenServer VM’s Unmanageable

Problem:

 

Xen begins throwing errors about not being able to attach VDIs, XenMotion stops working, and the system becomes unmanageable, or xapi service immediately dies after starting

 

 

Solution:

check space on dom0

df -h

If this is at a high threshold you may want to do the following:

Remove patches from /var/patch

cd /var/patch
rm <uuid of patch>

Remove old log files

rm -rf /var/log/*.gz

Clean temp log files

rm -rf /tmp/*.log

Cleanup any old patches

xe patch-clean

Clean out /var/log/btmp

echo > /var/log/btmp

Restart Xen Toolstack

xe-toolstack-restart

 

Re-occurance Prevention:

Create a mounted volume for /var/log or do the following:

in /etc/cron.daily create the following script called logclean.sh

#!/bin/bash
rm -rf /var/log/*.gz

Then change permissions to make it executable

chmod 700/etc/cron.daily/logclean.sh

Another alternative would be to setup an NFS mount in /var/log, however I already use a centralized log aggregation solution so this cron job while a fairly blunt instrument meets my needs.

Unable to Manage VMs in Hyper-V

Recently I came across a situation where a set of VMs running on cluster shared storage were all in an un-manageable state. They were not listing as being present on the host and exhibited the error shown in symptoms:

Symptoms

Virtual Machines show in a running or paused state but have the following error when attempting to access console, settings, or take any other management actions:

Screen Shot 2015-12-26 at 12.45.36 PM

Additionally try listing the VMs in Powershell

Get-VM

If the VM does not show in the list it is definitely in this state.

Problem

Check to see if any other VMs exhibit this issue and if they are on cluster shared storage. If you are using it make an exclusion for the Cluster Volume or disable AV as this is usually the culprit.

Remediation

Install Sysinternals process explorer here: https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx

Next we will locate the GUID of the VM in question

Screen Shot 2015-12-26 at 12.10.36 AM

Once you have located the GUID open process explorer as administrator and locate right click the VWP.exe processes and check the commandline entry under the “image” tab. This will display the GUID of the VMs. Select the VWP.exe that matches your trouble VM and kill the process.

DO NOT kill any other processes or you will potentially bring down other VMs

Screen Shot 2015-12-26 at 12.10.23 AM

Once you have done this you will need to restart the Hyper-V management stack by restarting the VMMS process. I have found some of these will not work properly with the restart-service cmdlet, so you will need to stop and then start the service. This does not effect the running VMs, it only effects your ability to manage them while this service is stopped.

Get-Service vmms
Stop-Service vmms
Start-Service vmms
Get-Service vmms

Now you should be up and running!