Introduction

It’s been a while since I’ve posted a blog. We’ve been extremely busy on our day to day jobs, but that does not mean that we are losing focus or that we don’t have anything to write about! Today I got another great quick blog for you guys surrounding yet another VMware Cloud Director 10.2.2 bug. Now let’s jump into this thing!

The Issue

We have been having intermittent issues on one of our VCD environments. This has ultimately led us to believe that there might be something wrong with the resources that were configured on the VCD Cells. It seemed that after the 10.2.2. upgrade we did earlier this year the recommended number of resources changes and we forgot to assign these. So now that we found this out we decided to shutdown the VCD environment and edit the resources to their recommended sizes.

Once we did that and restarted the environment, it seemed that VCD didn’t want to come back online. The postgres database and it’s cluster functions were thankfully all working like they should without any issues. Looking at the vmware-vcd service status we noticed that the Cell services weren’t starting and were hanging on this step of the process:

After some digging in the logs that are located in /opt/vmware/vcloud-director/logs/cell-runtime.log we noticed the following:

This however did not point us to anything useful at all. None of the others logs had anything else that we could use to troubleshoot and in the mean time the entire VCD platform (all three cells) were down. This was during a very busy morning so we figured we create a case with VMware.

The resolution

Not long after creating this case we got ourselfes a hold of a very good engineer that had a quick look through the files we already looked at and he remembered that he had seen this issue before, but only on VCD 10.2.2.1.x and not on 10.2.2. The issue itself turned out to be that in spontaneous situations in VCD 10.2.2.x the “jms.user.system.password” was missing in the database. There wasn’t really a reason for this to happen that I have been told, but this isn’t something that was present in a publicly available KB article. So I figured I write this down for everybody. You can check if you are having this issue by doing the following:

VCD 10.2.2 database issue missing the jms.user.system.password property
VCD 10.2.2 database issue missing the jms.user.system.password property

This usually should return two entries. The output below is from a working environment (Obviously the password is blanked in this example):

VCD 10.2.2 database issue where the jms.user.system.password property is not missing
VCD 10.2.2 database issue where the jms.user.system.password property is not missing

Now that we have established that I am having this issue, we can now fix the issue. Be aware that if you do not like to mess around in the postgresql database, you should create a case with VMware support so that they can do this for you! The following steps need to be followed to fix the issue:

  1. Go and download a previous backup of the environment. Most of the time this can be found in /opt/vmware/vcloud-director/data/transfer/backups (in 10.3.x) and in /opt/vmware/vcloud-director/data/transfer/vmware-vcd-support (Pre 10.3.x).
  2. Open up the vmware-vcd-support-xxxx-xx-xx.xxxx.tgz file with 7-Zip.
  3. Open up the export_db_xxx-xx-xx-xx-xx-xx.tar file with and the file within with 7-Zip.
  4. Find a file named config.dat and open this file.
  5. Look for the config key named jms.user.system.password. You should find it in this file. If you don’t, use an older backup and try again.
    1. Now extract the complete line into Notepad++ or anything similar.
  1. Use the first 4 digits as the config_id and the 4th entry as the password. Note these down and formulate the following sql query (Note, these can be different in each environment):
  1. If you enter this query in the database you will essentially re-enter the jms.user.system.password property back into the database.
  2. Now that we have done this we can check this by re-entering the following command, which will now get two results:

At this point we can go ahead and restart the vmware-vcd service with the following command: service vmware-vcd stop followed by a service vmware-vcd start. This should now start the vmware-vcd service without any issues. This is also persistent with all following reboots.

There you have it! Unfortunately this is another bug in the VCD 10.2.x branch, however this bug is fixed within the 10.3.x branch of VCD. If you happen to come across this issue, you can now just follow this guide and you will be up and running within minutes. If you do not have any recent backups on the VCD Cells, you could also use a recent postgresql backup. If you also don’t have this, you should contact VMware through a support case so that the password for the user can be regenerated.

Leave a Reply

Your email address will not be published. Required fields are marked *