Introduction

The other day we were updating one of our VMware Cloud Director (VCD) environments from 10.1 to 10.2.2 (the latest release as of now). The VCD upgrade all went perfectly fine and was finished within 30 minutes. Once this was done I started to do a healthcheck on every aspect that uses the VCD environment, including our backup solution Veeam.

Turns out, the VCD update to 10.2.2 actually broke my Veeam v11 environments. Yes multiple Veeam Backup environments broke on me!

The issue

Once I started troubleshooting I noticed that the two Veeam Backup environments that broke actually had two distinct issues rather than one issue that plagued both environments. The issues can be described as:

Veeam Backup environment 1 issue

The Veeam Backup server was no longer able to connect to the VCD environment. We received the following messages in the logs. Logs for any Veeam environment can be found at %programdata%\Veeam\Backup.

Veeam v11 connection issue to VCD 10.2.2
Veeam v11 connection issue to VCD 10.2.2

And that was basically all the information that was present in the log file. It seemed that the Veeam environment couldn’t create a session anymore with the VCD environment. One thing to note in this configuration is that we connected directly to the VCD environment through Veeam by adding the VCD portal. So we didn’t use the Veeam Enterprise Manager to connect to the VCD environment. The issue manifested itself by me not being able to browse or run any backup jobs anymore.

Veeam Backup environment 2 issue

The second Veeam environment was able to create a session to VCD, I was able to browse the backup jobs. At that point I knew we had two distinct issues on our hand. The issue manifested itself on this environment by me not being able to succesfully run any backup that was connected to the VCD environment. Digging through the logs I found the following entries for every VM that was being processed:

I even tried different configurations in the backup job. At one time I tried to backup only the vApp, or even just the VM. But everytime the job failed with the above error message. This message was also present in the Veeam UI and looks like this:

Veeam Console UI error message
Veeam Console UI error message

Strangely, even trying a vCenter backup job instead of a vCloud Director backup job, so that I could bypass VCD completely, failed. I received the same messages. So even without using the VCD connection, I still managed to not be able to get a succesfull backup from my VMs.

In the end I managed to find out what was happening here, continue reading to find out the fix for both environments.

The fix

Each VCD update brings new enhancements and security fixes. VCD 10.2.2 brings loads of new features such as Global Placement Policies, Tanzu enhancements, storage policy enhancements and more. However this time it broke my Veeam environment that much that I wasn’t able to create any backups anymore, even without the use of VCD itself.

If we look at the two distinct issues I had on the environment, there are also two solutions and reasons why the VCD update broke the environment. Let’s start by having a look at issue number #1.

Veeam Backup environment 1 fix

If we have a close look at the error message that we received I noticed that this probably has something to do with SSL\TLS settings. So I started off by checking the TLS configuration on the VCD Cells. You can check what TLS version VCD is using by executing the following command on any VCD Cell (no credentials are required):

Looking at the command output, this seems to be fine. I double checked in the local Veeam server I was having the issues with if the TLS 1.2 protocol was enabled and this seemed to be true. So no solution there unfortunately. Now I figured there must be something off with the Cryptograhic ciphers used in this VCD environment. Fortunately there is also a way to check what ciphers VCD is configured to use. You can do that by using the following command:

Looking at the above output also showed me that there is no custom configuration on the VCD Cell to additionally include or exclude any cipher suites. Looking at the VCD 10.2.2 release notes I can see that this is the default cipher suite set. Looking back at the VCD 10.2 release notes however I noticed that something changed during this release. VMware decided to drop a couple of cipher suites. See below for a direct comparison:

Normally this shouldn’t do anything. I started comparing the cipher suites against the Windows OS versions (2012 R2 environment 1) and (2016 environment 2) and doublechecked the Windows pages to see what cipher suites were supported on both GuestOS’s. Microsoft has a great page for that HERE.

I came to the conclusion that two out of four cipher suites configured on the VCD Cells were not supported on Windows Server 2012 R2. Which is the GuestOS used for this specific environment. So at this point I figured let’s update this specific Veeam server to Windows Server 2016 and have a look if that helps the issue. Well guess what, it did. Once the Veeam server was upgraded and ready to go, we noticed that the connection to the VCD environment worked instantly.

Veeam Backup environment 2 fix

The second issue took me a bit longer to actually find anything useful. I scrolled through dozens of Veeam Backup & Replication logs, tested a couple dozen different backup job configurations to find any difference but all seemed to fail. In the end Google also couldn’t provide me a direct lead to go on. So I started a Veeam case and not too long after that I got feedback that this might be because of a Known Issue in Veeam v11! I quickly searched the Veeam forums for the Veeam v11 Known issue list, which I found here!

Behold the following issue:

#4: vCloud Director jobs fail with a VDDK error
Symptoms: vCloud Director based backup and replication jobs may fail with the “VDDK error: 3 (One of the parameters was invalid)” error.
Cause: Conflict with the SkipCertificateCheck registry value that was set while using a previous product version.
Status: Remove the registry value from the backup server (reboot is not required). Then, go to the Inventory tab, open the vCenter properties and go through the wizard (click “Next” button several times), accept the certificate and save the vCenter settings.

Well I’ll be damned. I quickly checked the environment and ofcourse the “SkipCertificateCheck” registry key was present. After removing the key (No reboot for the Veeam Services required) all the backup jobs worked again, even the ones that were not using VCD but backing up VM’s directly through the vCenter Server.

This specific registry key is known to have issues with Veeam v11 and VCD. The strange thing about this on our environment is that the Veeam enviroment was recently upgraded to v11, but no issues were presented, until I upgraded VCD to 10.2.2.

Conclusion

Starting the weekend off by doing a casual VCD update to the latest version ended up with me grinding through hundreds of log files and starting a Veeam case. In the end it turned out to be two distinct issues that came on two Veeam Backup & Replication environments.

The first issue was probably due to the fact that VCD changed a number of ciphers that are used on the VCD Cells that seem not to be compatible with Windows Server 2012 R2 (and lower). Because of this the Veeam server wasn’t able to establish a session with VCD anymore, which resulted in me being unable to do anything with the environment. This however was quickly fixed after we upgraded the Veeam Server to Windows 2016.

The second issue was a result of an unprepared upgrade to Veeam v11 a while ago. As I did not do the upgrade myself I wasn’t aware that there was a known issue with VCD on Veeam v11 environments that used the “SkipCertificateCheck” registry key that was used in previous versions. Once we removed the registry key and re-ran the VCD configuration everything started working again.

There you have it guys! Hopefully this is something that you will read before upgrading either Veeam to v11 or VCD to 10.2.2.

Leave a Reply

Your email address will not be published. Required fields are marked *