Recently we’ve had some weird issues on one of our customers vCenter Servers. For starters the vMotion and Storage vMotion features weren’t working anymore because of time-outs. Which is weird and something I’ve never seen before. So we started troubleshooting the VCSA server and noticed that it couldn’t retrieve the installed licenses (VMware vSphere Enterprise Plus) from the production ESXi hosts anymore.

Going to the “Licensed Features” tab in the vSphere Client (VCSA version 6.0 GA) usually gives you a nice overview of what vSphere license is installed, but this time it was just empty. Going to the ESXi host directly you could however see that the license was present and activated. We also noticed that the License module in the vSphere client was also providing us with a timeout.

Once we dove into the log files from the license service in “/var/log/vmware/cis-license/license.log” we noticed some Security Token Service STS service, SSO service and web-client service issues in regards to certificates. Which got me thinking and looking at the certificates for this vCenter Server Appliance. Below you can find some snippets of logs which might be interesting for you to match your problem to the one I was having:

You can use the following cli cmdlets to check your certificate stores and the certificates that are in them:

All certificates checked out but guess what, the “MACHINE_SSL_CERT” didn’t. Turns out it was expired. Funny thing though is that this particular vCenter Appliance should’nt even be working anymore because once the certificate is expired, most of the time it won’t even start all of the vCenter services once you reboot it. In our case somehow it did.

So we went ahead and fired up the “certificate-manager” tool which can be found in “/usr/lib/vmware-vmca/bin/certificate-manager”, picked option 3 to replace the the Machine SSL with a VMCA certificate (which is a self-signed certificate but that’s fine for this environment), entered the information which was present in the current certificate such as hostnames and IP-address information and accepted all changes.

Certificate-manager tool on the vCenter Server Appliance

Once you accepted the change it is proposing it will update the certificates in the locations it is needed and stop and start all services. Piece of cake. Our certificate-manager however decided it was time to throw an error:

Once we checked that log we saw that the certificate-manager tooling couldn’t start the “vmware-eam” service, see the below log snippet which can be found in “/var/log/vmware/vmcad/certificate-manager.log”:

Sure enough we were hitting a bug in our vCenter Server Appliance. This bug prevented the EAM service from starting after a vCenter reboot. This bug basically deletes the “eam.properties” file in the “/etc/vmware-eam/” directory. This file is crucial for the service to start and know what to do. Since this file was missing in our environment, the “vmware-eam” service was broken. This VMware KB explains how to fix this. Which basically means that you have to download the attachment called “Recreate_eam.properties.sh” and run it. This script recreates the eam.properties file so that your “vmware-eam” service can start again. Please not that you can only run this when you run the EAM service on the vCenter Server you are working on. The steps to run this script are described below:

In our situation this almost fixed our issues. We were forced to break the certificate-manager procedure in the middle where it starts starting the services again after it updated the “MACHINE_SSL_CERT” in the places it has to. You can do this by just pressing CTRL+C on the right time in the procedure. To find this correct time you can open another putty session to the VMware vCenter server and using the following command:

Just press CTRL+C when the following log entries pass by:

Once you are at this point just start the services yourself with:

This should start all the services nicely. After this point we had our VMware vCenter Server Appliance working again with a new fresh “MACHINE_SSL_CERT” certificate. As a last check you can execute the following command and verify the expiration date:

There you have it. I figured it would be easy enough and fix this quickly, turned out we were facing a bug in the “vmware-eam” service. I hope this post helps when you are finding the same issues we found.

4 Comments

  1. Thank you very much! I hadn’t thought of CTRL+C to prevent the scripts from reverting the certificate updates when some services won’t start. Great tip! I was able to get the problem resolved before VMWare support even called me back. 🙂

    Howard
  2. I have to say VMware esx is full of bugs, even in a mature version like esx 6.7 cu3
    Autostart doesn’t work (which is very very bad because the vcenter server has to run)
    So starting the “photon-machine” manually.
    Later my vcenter server did not register with AD. Error message said (decoded) clocks are totally out of sync. So I check up the system protocols… what? installed today but log entries from 2013? The host did have a bad CMOS battery but I did not pay attention. However the vCenter app did survive some reboots. And yes, the app is so damned crude that it hasn’t eve it’s own time sync, in the network configuration is no such a thing to configure a ntp server.
    Then I checked the bios of the host. Corrected the date. And then… guess. Vcenter is dead. Because short mided people at VMware did create a self signed certificate with a short expiration time with damned TWO YEARS. Meaning the vcenter server kills itself (in labs without tight certificate management) after 2 years. My browser tells me that the certificate was valid between 7/8/2013 and 7/8/2015.

    Erhin Hensel

Leave a Reply

Your email address will not be published. Required fields are marked *