Introduction
The other day I was trying to move a couple of virtual machines on one of our environments and I faced some awesome general system errors! Great. I’ve seen these before and figured I’d write this in a quick blogpost this time around. The error message I faced from the vSphere Client was the following:
A general system error occurred: vDS host error: see faultCause
When we look at the GUI side of this, and expand on this we can see some additional information that belongs to the above error message.
A general system error occurred: vDS host error: see faultCause Cannot create DVPort 99527 of VDS dvSwitch01 on the host esx01 fault.DvsApplyOperationFault.summary An error occurred during host configuration
Troubleshooting
Alright. The above expanded error message does seem to indicate that there is an issue with the dvSwitch of some sorts. For troubleshooting purposes I also revisited the following:
- Rebooting the virtual machine and retrying the relocation.
- Shutting down the virtual machine and retrying the relocation.
The easy steps did not resolve the issue. Once I did this I also looked inside the vmware.log file for any clues as to why this is happening:
2020-08-25T07:42:43.452Z| vmx| I125: FILE: FileCreateDirectoryEx: Failed to create /tmp. Error = 17 2020-08-25T07:42:43.452Z| vmx| I125: FILE: FileCreateDirectoryEx: Failed to create /tmp/vmware-root. Error = 17
This unfortunately does not seem relevant to our issue here. Because the issue is regarding a port creation error on the dvSwitch, I also had a look at the “hostd.log” logfile on the ESXi host. This yielded the following log snippet:
2020-08-25T09:42:11.348Z warning hostd[2100519] [Originator@6876 sub=Hostsvc.NetworkProvider] Unable to lookup port 99527 on dvs 25 11 33 50 8e e7 16 4e-b5 53 a4 22 2020-08-25T09:45:15.859Z warning hostd[2099739] [Originator@6876 sub=Hostsvc.NetworkProvider opID=aeec4e44] Unable to lookup port 99527 on dvs 25 11 33 50 8e e7 16 4e-b5 53 a4 22 2020-08-25T09:45:23.246Z warning hostd[2100532] [Originator@6876 sub=Hostsvc.NetworkProvider opID=aeec4e9d] Unable to lookup port 99527 on dvs 25 11 33 50 8e e7 16 4e-b5 53 a4 22 2020-08-25T09:45:53.246Z warning hostd[2099739] [Originator@6876 sub=Hostsvc.NetworkProvider opID=aeec4f3e] Unable to lookup port 99527 on dvs 25 11 33 50 8e e7 16 4e-b5 53 a4 22
This does seem to have more meaning. It looks like the ESXi host cannot find the mentioned Port ID that the virtual machine in question in using on the dvSwitch. Let’s check this by going into the vSphere Client and looking this port up. You can do this by logging into the vSphere Client -> Home -> Networking -> dvSwitch -> Ports and filtering on the Port ID which we found in the hostd.log file. My advise is to do this in the old vSphere Webclient (Flex) because the new HTML5 webclient has a bug (at least in the latest vCenter Server 6.7 U3 version) that prevents you to sort beyond 2.000 ports. Eventhough there are more ports the UI will only display the first 2.000, and you cannot see, select or filter anything beyond that. The old vSphere Webclient does show you all the ports, and you can easily find the port be sorting on the Port ID.
The port does seem to be present. However, it’s being used by another virtual machine. How is that possible? Well if we have a look at the arrow I pointed in the above picture, and click on the dvSwitch, we can see the following is also present in the environment:
The vSphere Distributed Switch configuration on some hosts differed from that of the vCenter Server
Alright, so this would make sense that we are receiving issues in the environment. This is because the dvSwitch configuration that is stored on the ESXi host in (/etc/vmware/dvsdata.db) (this is used in case the vCenter Server is down) actually differs from the configuration that is present in the vCenter Server database. This by itself is not a big deal, because the vCenter Server will update this data in the end. But now it is giving us issues because other virtual machines are using the ports that are already in use. So since we can’t wait we need to fix the discrepancy ourselfes. If you press “Show Details” you can see some more information regarding the affected hosts and Port ID’s. An example of this can be found in the following image:
Since we are now aware of this issue we can fix this. There are a couple of solutions that will help you regain control over relocating virtual machines that are showing the error message. Ofcourse we can also have a look at the dvsdata.db file on the ESXi host with the “net-dvs” cmdlets, but since I didn’t have the time at that point I skipped this step. With the net-dvs cmdlets you can lookup and alter all sorts of dvSwitch information. I highly recommend you to have a look at this and play around on your environment with it to familiarize yourself. However, please do remember that this command is an unsupported command.
Solution #1
The first solution is a solution that will most probably quickly fix the issue of relocating your virtual machine. Go into the Virtual Machine settings -> Network Adapters -> Port ID and change the Port ID to a number that is not being used on the dvSwitch. Check this on the dvSwitch before you change it. **Word of caution, this might give you a slight hickup on the VM if it is running.
This should instantly fix the relocation issue for your virtual machine. If this however does not fix it, proceed to Solution #2.
Solution #2
Once we executed Solution #1 we should be able to move the virtual machine. Only thing left is the dvSwitch warning regarding mismatching configuration on the ESXi host, we should now also be able to resolve this by pressing “Rectify”. Select the host(s) you want to rectify and press “Rectify” again.
In most cases this will resolve the warning messages for you. Now in my case Solution #1 fixed the relocation issue, but I couldn’t fix the dvSwitch warning message while trying the above action to “Rectify” my ESXi hosts. To fix this, you unfortunately need to do some sort of maintenance on the environment. If you follow the following instructions, everything should be fine again:
- Put the ESXi host in maintenance mode. If the VM is still not fixed, and it can’t be moved, you will have to remove the VM from inventory.
- Disconnect the ESXi host from the vCenter Server.
- Go to the dvSwitch -> Actions -> Add and Manage Hosts… -> Manage host networking
- Attach Host -> Select the ESXi host.
- Manage physical adapters -> Select the adapters that the host uses for this dvSwitch -> Unassign adapter.
- Click Next two times -> Finish
- Connect the ESXi host back to the vCenter Server.
- Re-Add the host to the dvSwitch with:
- Go to the dvSwitch -> Actions -> Add and Manage Hosts… -> Manage host networking
- Manage physical adapters -> Select the adapters that the host used for this dvSwitch -> Assign adapter.
- Click Next two times -> Finish
- Import the removed VM back to your inventory and start it.
At this point we basically deleted the ESXi host from the dvSwitch and re-added it back to the dvSwitch to forcefully re-init the sync between them. The reason we need to disconnect the ESXi host is because the Port ID that is causing the issue for the relocation of the virtual machine, is also stuck and giving us a “This port is in use” error message when we try to remove the host from the dvSwitch. At this point everything on the environment should be healthy again and you should be able to move the virtual machine again.
Conclusion
Fixing a “general system error” message can be hard. Finding out the underlying issue is not always possible. But in this case we were able to lookup the underlying issue, a dvSwitch mismatching configuration, and fix it so that we were once again able to move the virtual machine. Now you might be wondering where this issue came from at the first place. Well I do have an idea for this.
We also had a dvSwitch issue on this environment a while back where there were to many ports in use by the dvSwitch. At some point we had to reduce the amount of used ports. We wrote a script for that which helped us clear unused ports from the dvSwitch, but I think it somehow messed up the sync between the dvSwitch and the ESXi hosts. Which doesn’t seem to crazy considering we had major issues with the dvSwitch while it was over-used in regards to connectivity.
I hope you liked this quick blog post and found this useful. I will see you in the next one!
2 Comments
Marc · February 9, 2024 at 1:17 am
Unfortunately this cannot be done from vSphere client if the ESX host has only one NIC.
Bryan van Eeden · March 14, 2024 at 8:18 am
Correct, you might need to do a clean re-configure on the networking stack if you only have one nic.