Introduction

The other day we upgraded our VMware Cloud Director (VCD) environment. The upgrade itself did not go so well for us, but I will discuss this in another blogpost. However this blogpost is for a specific bug which came up on our environment after the upgrade. The bug manifested itself within the VCD UI which mentioned that one of the configured vCenter Servers could not be reached. There were dozens of tasks within the VCD UI that mentioned the following:

The operation failed because vCenter Server "vcsa02" is not connected. - The operation failed because vCenter Server "vcsa02" is not connected.

Troubleshooting

The error message itself had the following under the debug information:

 - could not update: [com.vmware.vcloud.common.model.inventory.TaskInventoryModel#XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX]; SQL [/* update com.vmware.vcloud.common.model.inventory.TaskInventoryModel */ update task_inv set age=?, task_cancellable=?, task_cancelled=?, completion_date=?, description=?, error_message=?, moref=?, object_moref=?, object_name=?, object_type=?, parent_moref=?, task_progress=?, result=?, result_object_type=?, start_timestamp=?, status=?, task_exception=?, vc_id=? where task_inv_id=?]; nested exception is org.hibernate.exception.DataException: could not update: [com.vmware.vcloud.common.model.inventory.TaskInventoryModel#XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX]
 - could not update: [com.vmware.vcloud.common.model.inventory.TaskInventoryModel#XXXXXXXX-XXXXXXXX-4XXXXXXXX02c-XXXXXXXX-XXXXXXXX]
 - Batch entry 42 /* Method: unknown */ /* Method: unknown */ /* update com.vmware.vcloud.common.model.inventory.TaskInventoryModel */ update task_inv set age=0, task_cancellable='FALSE', task_cancelled='FALSE', completion_date='2024-02-26 13:19:56.893+01', description='VirtualDiskManager.deleteVirtualDisk', error_message='Error caused by file ds:///vmfs/volumes/XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX/C4-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX (XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX)/C4-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX (XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX)XXXXXXXX.vmdk attached to C4-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX (XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX) ', moref='task-XXXXXXXX', object_moref='none', object_name='', object_type=NULL, parent_moref=NULL, task_progress=0, result=NULL, result_object_type=NULL, start_timestamp='2024-02-26 13:19:56.436+01', status=2, task_exception=?, vc_id='XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX-XXXXXXXX'::uuid where task_inv_id='033cc3cd-a87d-4799-97b2-fc07b864b261'::uuid was aborted: ERROR: value too long for type character varying(256)  Call getNextException to see other errors in the batch.
 - ERROR: value to...

Now I’ve done my fair share on VCD troubleshooting, but I have never seen any sql queries directly in the task debug information before. The perceptive eye will see messages related to “C4-XXXXX” disks within the vCenter Server that could not be updated within the VCD database due to the fact that the values were too long for the database table (more than 256 characters). This is in the end coming from VMware Cloud Director Availability (VCDA) because that is what we are using and where the “C4-XXXXX” are in use als VM containers that report storage usage for replications to VCD. Our environment had not changed except for the VCD upgrade from 10.4.2.2 to 10.5.1, so there had to be something changed within VCD.

After consulting VMware GSS it appeared to be cosmetic and didn’t have any impact. However, not quickly after this we noticed it did have impact on our environment and that it wasn’t purely cosmetic. We noticed this because we could no longer import VM’s from our environment (vCenter Server) into VCD. This gave us the following error:

The operation failed because no suitable resource was found. Out of 1 candidate hubs: 1 hubs eliminated because: Compute requirement not met: [type:HardwareVersion, value:vmx-19]. Rejected hubs: resgroup-27 - PlacementException NO_FEASIBLE_PLACEMENT_SOLUTION

This however it strange, since the entire pVDC and all of the below clusters were the same as before the VCD upgrade. Nothing version wise changed and we could do the same actions before the upgrade. Eventually we debugged the environment a bit more and found out that because of the first message, certain vCenter Server inventory tables in the database were no longer updated and fell behind.

Now any environment is different, so the visible errors might differ in your environment!

The reason this was happening all of a sudden after the VCD 10.5.1 upgrade is because VMware changed the logic for error handling in the database. Before this patch the error was truncated, in this release the error was all logged. Some errors can be rather large all depending on how the application that generates the error handles this. There are several scenarios in which error messages can surpass the 256 character limit:

  • Disk failures with a large disk path which is the case with replicating objects.
  • Segment or Logical Switch errors when they have a large name.
  • Uncorrect error handling by stating the error multiple times, not properly truncating/summarizing errors.

In our case some error messages regarding VCDA “C4-XXXX” disks have very large disk paths. Now, this can be fixed very easily be simply increasing the characters for this specific table in the database with the following command:

alter table task_inv alter column error_message TYPE varchar(512);

VMware told me that in the following VCD release 10.5.1.1 this mechanic would be fixed again in which it will truncate the message instead of fully logging it into the database.

So there you have it, a rather annoying bug, bug a very quick fix until the VCD 10.5.1.1 release!


Bryan van Eeden

Bryan is an ambitious and seasoned IT professional with almost a decade of experience in designing, building and operating complex (virtual) IT environments. In his current role he tackles customers, complex issues and design questions on a daily basis. Bryan holds several certifications such as VCIX-DCV, VCAP-DCA, VCAP-DCD, V(T)SP and vSAN and vCloud Specialist badges.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *