Introduction
The other day we were finding us in the position to completely re-install a couple of vSphere ESXi hosts which were once part of a larger VMware vSAN cluster environment. Usually this does not take that long to do since nowadays the cluster wizard inside the vCenter Server does most of the configuration for you. However it turns out that this time our environment had other plans for us!
The issue and solution
We took the hosts from the other environment, placed them inside a new cluster in our environment and we immediately noticed that vCenter saw a vSAN (or part off) Cluster. Once I noticed this I removed all of the disk groups by unmounting them, removing them and in the end shutting down vSAN. Next steps were removing all off the physical configuration from the disks in the RAID controller and after this simply power on the hosts as if they were fresh.
So far so good. I booted all of the hosts, re-enabled/configured vSAN and there we go right? No unfortunately on our environment this was not the case. It seems we had inaccessible objects in our environment. As you can see below, please remember this is a completely fresh environment:
Since this environment is completely empty I figured why not delete it with the “objtool” on the ESXi hosts themselfes. Well, let’s take a look at the output I received there;
[root@esx11-am:~] /usr/lib/vmware/osfs/bin/objtool delete -u abfc6e65-6408-aa5b-8acf-b88303594c10 -f; object deletion ioctl failed: Input/output error object delete error: Failure
I will write up a later blogpost on how to use the “objtool” on vSAN for some more stuff. OK so that doesn’t work either, even with the -f (force) option. At this point since the environment was fresh I decided to do the same things as before. Unmount all disk groups, remove all disk groups and rebuild the entire thing. Once I did this, this still came up like above. A fresh cluster with inaccessible objects.
After some searching I found the following KB: https://kb.vmware.com/s/article/87350. This KB mentions several issues with the GUI vSAN shutdown procedure (this is present since vSphere ESXi 7.0 U3) even if executed cleanly like I did. After checking the KB, none of the ESXi hosts or vCenters are on 7.0u3d (they are higher) but still this issue persisted in our environment. Luckily there is a simple fix for the issue. Execute the following two commands on the CLI on each ESXi host:
To check the following advanced settings: esxcfg-advcfg -g /VSAN/DOMPauseAllCCPs esxcfg-advcfg -g /VSAN/IgnoreClusterMemberListUpdates Both of these should be 0. Correct the advanced settings: esxcfg-advcfg -s 0 /VSAN/DOMPauseAllCCPs esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListUpdates
In our case, the “/VSAN/DOMPauseAllCCPs” was 1 on 3 of the 4 ESXi hosts which caused the inaccessibility. Once I configured the value at 0 everything came back as it should:
The “DOMPauseAllCCPs” setting pauses most (if not all) activity on the ESXi hosts regarding vSAN. The “IgnoreClusterMemberListUpdates” advanced setting ensures that when hosts are disconnected from vCenter Server they will not lose their unicast configurations which upholds the connectivity with the other members of the cluster (it’s a sort off freeze for this part of the setup).
There you have it, another day another issue. I hope this was useful!
0 Comments