Recently we’ve had some weird issues on one of our customers vCenter Servers. For starters the vMotion and Storage vMotion features weren’t working anymore because of time-outs. Which is weird and something I’ve never seen before. So we started troubleshooting the VCSA server and noticed that it couldn’t retrieve the installed licenses (VMware vSphere Enterprise Plus) from the production ESXi hosts anymore.

Going to the “Licensed Features” tab in the vSphere Client (VCSA version 6.0 GA) usually gives you a nice overview of what vSphere license is installed, but this time it was just empty. Going to the ESXi host directly you could however see that the license was present and activated. We also noticed that the License module in the vSphere client was also providing us with a timeout.

Once we dove into the log files from the license service in “/var/log/vmware/cis-license/license.log” we noticed some Security Token Service STS service, SSO service and web-client service issues in regards to certificates. Which got me thinking and looking at the certificates for this vCenter Server Appliance. Below you can find some snippets of logs which might be interesting for you to match your problem to the one I was having:

You can use the following cli cmdlets to check your certificate stores and the certificates that are in them:

All certificates checked out but guess what, the “MACHINE_SSL_CERT” didn’t. Turns out it was expired. Funny thing though is that this particular vCenter Appliance should’nt even be working anymore because once the certificate is expired, most of the time it won’t even start all of the vCenter services once you reboot it. In our case somehow it did.

So we went ahead and fired up the “certificate-manager” tool which can be found in “/usr/lib/vmware-vmca/bin/certificate-manager”, picked option 3 to replace the the Machine SSL with a VMCA certificate (which is a self-signed certificate but that’s fine for this environment), entered the information which was present in the current certificate such as hostnames and IP-address information and accepted all changes.

Certificate-manager tool on the vCenter Server Appliance

Once you accepted the change it is proposing it will update the certificates in the locations it is needed and stop and start all services. Piece of cake. Our certificate-manager however decided it was time to throw an error:

Once we checked that log we saw that the certificate-manager tooling couldn’t start the “vmware-eam” service, see the below log snippet which can be found in “/var/log/vmware/vmcad/certificate-manager.log”:

Sure enough we were hitting a bug in our vCenter Server Appliance. This bug prevented the EAM service from starting after a vCenter reboot. This bug basically deletes the “eam.properties” file in the “/etc/vmware-eam/” directory. This file is crucial for the service to start and know what to do. Since this file was missing in our environment, the “vmware-eam” service was broken. This VMware KB explains how to fix this. Which basically means that you have to download the attachment called “Recreate_eam.properties.sh” and run it. This script recreates the eam.properties file so that your “vmware-eam” service can start again. Please not that you can only run this when you run the EAM service on the vCenter Server you are working on. The steps to run this script are described below:

In our situation this almost fixed our issues. We were forced to break the certificate-manager procedure in the middle where it starts starting the services again after it updated the “MACHINE_SSL_CERT” in the places it has to. You can do this by just pressing CTRL+C on the right time in the procedure. To find this correct time you can open another putty session to the VMware vCenter server and using the following command:

Just press CTRL+C when the following log entries pass by:

Once you are at this point just start the services yourself with:

This should start all the services nicely. After this point we had our VMware vCenter Server Appliance working again with a new fresh “MACHINE_SSL_CERT” certificate. As a last check you can execute the following command and verify the expiration date:

There you have it. I figured it would be easy enough and fix this quickly, turned out we were facing a bug in the “vmware-eam” service. I hope this post helps when you are finding the same issues we found.

Share this if you found this interesting.

Leave a Reply

Your email address will not be published. Required fields are marked *