Introduction
Today we have something different from the ever growing VMware Cloud Director (VCD) blogposts on our blog! The other day I was upgrading a couple of ESXi hosts on one of our environments. This however did not work how it should, the ESXi host would not stage or remediate the patches and would fail with a time-out. Let’s get to it why this was happening and how to fix that.
Troubleshooting
The ESXi host in question was being updated by vSphere Update Manager (VUM) with the older baseline functionality. This means that you define a patch/upgrade baseline that you want your host to be on. VUM patches the ESXi hosts and make sure that the host has all the patches installed that are required by the baseline. You can directly install the patches on the ESXi host (once it is in maintenance mode) or stage the patches to the host (even without maintenance mode). Neither of these two actions worked on our environment. Both actions failed with a simple time-out error message after about 30 minutes.
Now when you are updating ESXi hosts there are a couple of places where you can look that contain log files. The easiest one would be the /var/log/esxupdate.log
logfile. This logfile contains everything related to the upgrade and any and all VUM actions. This logfile didn’t actually end or have any usefull information. Looking at the logfile it looked like everything was working fine, it ended with some regular entries such as:
......................... 2023-04-01T09:05:14Z esxupdate: 5067965: downloader: INFO: Opening http://vcenter:9084/vum/repository/hostupdate/vmw/vib20/vsan/VMware_bootbank_vsan_7.0.3-0.75.21313628.vib for download 2023-04-01T09:05:14Z esxupdate: 5067965: downloader: INFO: Opening http://vcenter:9084/vum/repository/hostupdate/vmw/vib20/vsanhealth/VMware_bootbank_vsanhealth_7.0.3-0.75.21313628.vib for download 2023-04-01T09:05:14Z esxupdate: 5067965: downloader: INFO: Opening http://vcenter:9084/vum/repository/hostupdate/vmw/vib20/tools-light/VMware_locker_tools-light_12.1.5.20735119-20735876.vib for download .........................
Looks fine right? So since this wasn’t providing us with any information, let’s head on the the /var/log/hostd.log
(ESXi Host) and /var/log/vpxa.log
(vCenter Server Agent) files to check these out. Now these got some more information:
hostd.log: 2023-04-01T09:00:57.906Z info hostd[2100405] [Originator@6876 sub=Hostsvc.OptionManager opID=2a68f5a0-be-955c user=vpxuser:com.vmware.vcIntegrity] Failed to read advanced option subtree UserVars.PXEBootEnabled: N3Vim5Fault11InvalidName9ExceptionE(Fault cause: vim.fault.InvalidName --> ) 2023-04-01T09:00:57.906Z info hostd[2100405] [Originator@6876 sub=AdapterServer opID=2a68f5a0-be-955c user=vpxuser:com.vmware.vcIntegrity] AdapterServer caught exception; <<xxxxxxxxxxxxxxxxxxxxxxxxx, <TCP '127.0.0.1 : 8307'>, <TCP '127.0.0.1 : 56216'>>, ha-adv-options, vim.option.OptionMana --> ) 2023-04-01T09:00:57.910Z info hostd[2100405] [Originator@6876 sub=Solo.Vmomi opID=2a68f5a0-be-955c user=vpxuser:com.vmware.vcIntegrity] Activation finished; <<xxxxxxxxxxxxxxxxxxxxxxxxx, <TCP '127.0.0.1 : 8307'>, <TCP '127.0.0.1 : xxxxxx'>>, ha-adv-options, vim.option.OptionManager.queryView> 2023-04-01T09:00:57.910Z verbose hostd[2100405] [Originator@6876 sub=Solo.Vmomi opID=2a68f5a0-be-955c user=vpxuser:com.vmware.vcIntegrity] Arg name: --> "UserVars.PXEBootEnabled" 2023-04-01T09:00:57.910Z info hostd[2100405] [Originator@6876 sub=Solo.Vmomi opID=2a68f5a0-be-955c user=vpxuser:com.vmware.vcIntegrity] Throw vim.fault.InvalidName 2023-04-01T09:00:57.910Z info hostd[2100405] [Originator@6876 sub=Solo.Vmomi opID=2a68f5a0-be-955c user=vpxuser:com.vmware.vcIntegrity] Result: --> (vim.fault.InvalidName) { --> name = "UserVars.PXEBootEnabled", --> msg = "", --> } 2023-04-01T09:00:57.948Z info hostd[2107946] [Originator@6876 sub=Hostsvc.OptionManager opID=dc99556-98-955d user=vpxuser:com.vmware.vcIntegrity] Failed to read advanced option subtree UserVars.ImageCachedSystem: N3Vim5Fault11InvalidName9ExceptionE(Fault cause: vim.fault.InvalidName --> ) .......................................... .......................................... .......................................... .......................................... 2023-04-01T09:00:57.950Z info hostd[2107946] [Originator@6876 sub=Solo.Vmomi opID=dc99556-98-955d user=vpxuser:com.vmware.vcIntegrity] Throw vim.fault.InvalidName 2023-04-01T09:00:57.950Z info hostd[2107946] [Originator@6876 sub=Solo.Vmomi opID=dc99556-98-955d user=vpxuser:com.vmware.vcIntegrity] Result: --> (vim.fault.InvalidName) { --> name = "UserVars.ImageCachedSystem", --> msg = "", --> } 2023-04-01T09:00:57.987Z info hostd[2100404] [Originator@6876 sub=Hostsvc.OptionManager opID=59b0bcde-78-955e user=vpxuser:com.vmware.vcIntegrity] Failed to read advanced option subtree UserVars.PXEBootEnabled: N3Vim5Fault11InvalidName9ExceptionE(Fault cause: vim.fault.InvalidName
vpxa.log: 2023-04-01T10:11:09.374Z info vpxa[2107167] [Originator@6876 sub=Default opID=63e4bfc6-7a] [VpxLRO] -- ERROR lro-633649 -- EsxHostAdvSettings -- vim.option.OptionManager.queryView --> Result: --> (vim.fault.InvalidName) { --> faultCause = (vmodl.MethodFault) null, --> faultMessage = <unset>, --> name = "UserVars.PXEBootEnabled", --> entity = <unset> --> msg = "Received SOAP response fault from [<<io_obj p:XXXXXXXXX, h:24, <TCP '127.0.0.1 : 14481'>, <TCP '127.0.0.1 : 8307'>>, /sdk>]: queryView --> 'UserVars.PXEBootEnabled' is invalid or exceeds the maximum number of characters permitted." --> } --> Args: --> --> Arg name: --> "UserVars.PXEBootEnabled"
Alright interesting. It seems like VUM, or atleast some part of VUM, is falling over some advanced UserVars configurations. After browsing through the logs it looked like the next two were causing issues:
- UserVars.PXEBootEnabled –> a kernel configuration parameter that tells to use PXE Boot.
- UserVars.ImageCachedSystem –> a kernel configuration parameter that tells the system were images are cached.
I’ve looked both up but they don’t see to be present in recent iterations for ESXi. Looking at the ESXi hosts, they also don’t seem to appear to be present in the host itself:
[root@esx04:/var/log] esxcli system settings advanced list | grep -i image Path: /UserVars/EsximageNetTimeout Path: /UserVars/EsximageNetRetries Path: /UserVars/EsximageNetRateLimit
Removing them with the esxcli commands also don’t seem to work. ESXi does not see them as an active option:
[root@esx04:/var/log] esxcli system settings advanced remove -o ImageCachedSystem Option does not exist.
With that I was a bit out of options. There are two possible solutions left:
- Re-install the entire ESXi host.
- Reset the VUM DB.
Since I didn’t want to re-install the ESXi host (which will work) because it takes too much time. I was going for the second option. Luckily this seemed to fix some issues! Now if you want to reset the VUM DB you can follow the next KB or just take a quick look at the lines below:
# Stop the VMware Update Manager Service: service-control --stop vmware-updatemgr #Remove the VUM DB: python /usr/lib/vmware-updatemgr/bin/updatemgr-utility.py reset-db rm -rf /storage/updatemgr/patch-store/* # Start the VMware Update Manager Service: service-control --start vmware-updatemgr
Once you’ve executed the above you can find a clean VUM configuration in your vCenter Server. This also means all baselines, images, custom repositories and settings are removed. So you should re-configure this to your likings. Once we did this, we ran the baselines again on the host and everything worked flawless.
Conclusion
During this issue we have discovered that the ESXi host was likely upgraded from many older versions to the current (vSphere 7) version. The ESXi host would initially not upgrade to the latest versions using VMware vSphere Update Manager (VUM) baselines without any apparent issues. After troubleshooting we figured out that there were some old advanced kernel configuration parameters blocking the upgrade. However, it was not possible to remove these.
To fix the issue you can either re-install the ESXi host or clean the VUM database and try again. In our case cleaning the VUM database worked. I hope this helped you with your issue!
0 Comments