Introduction to vSphere Tags

VMware vSphere Tags is a way to attach metadata to VMware vCenter inventory objects to make it easier to find and recognize these objects. If you, like me, know about this functionality within vSphere, you know how useful this can be. In previous versions of vSphere, VMware had something (it’s still available) that is called vSphere Custom Attributes. Custom attributes seemed to be troublesome to manage and from my own experience it wasn’t practical to use as a system administrator on a larger scale.

You can add a tag or label on almost all vSphere inventory objects. These tags can also be assigned to categories, which can group tags together under one group. Below a quick summary that shows the vSphere objects that can be tagged (You can also find this within PowerCLI; execute
Get-Help New-TagCategory -Detailed):

  • Cluster
  • Datacenter
  • Datastore
  • Datastore Cluster
  • Distributed PortGroup
  • Distributed Switch
  • Folder
  • Resource Pool
  • vApp
  • Virtual Portgroup
  • Virtual Machine
  • VM
  • VM Host
  • Libary Item
  • Content Library

You should know that if you choose all objects when you initially create the category, this cannot be changed later on. Within each tag category you can choose to have a cardinality of one or multiple. This means that within this tag category you can either have a maximum of one assigned tag on an object, or multiple assigned tags from the same category. You should be aware of this because if you for example want to use this feature for customer identification, you do not want multiple customer tags attached on the same vSphere object, you’d want to have those mutually exclusive.

Now let’s go to the nitty-gritty details! If you follow the below PowerCLI code examples you will create a vSphere Tag Category called “Customer” and a Tag called “CustomerA”. The cardinality will be single, since you don’t want to have multiple customer names on a vSphere object, and only vSphere VM objects are allowed to be tagged.

New-TagCategory -Name "Customer" -Cardinality Single -Description "Customer Name category." -EntityType VirtualMachine
New-Tag -Name "CustomerA" -Category "Customer"

So once you’ve done this, you can use it to assign tags to virtual machines in the vSphere inventory. So let’s say we want to assign the recently created tag to a list of virtual machines that are currently in the ResourcePool called “CustomerA”.

Get-ResourcePool -Name "CustomerA" | Get-VM | New-TagAssignment -Tag "Customer"

Once the vSphere objects are tagged you can sort through them with ease with PowerCLI, by for example the following couple of commands:

Get-TagAssignment -Entity "VMa"
Get-VM | Get-TagAssignment | where{$_.Tag -like "CustomerA"}
Get-VM -Tag "CustomerA"

And so on and so forth. If you are using Enhanced/Embedded Linked Mode in your vCenter configuration you should be aware that you need to provide the "-Server" parameter to the New-TagAssignment command. Tags get replicated between vCenter’s in a ELM configuration, but I’ve had issues with tag assignments while not using the "-Server" because each vCenter has the tag present. Which means, if you have 3 vCenters in the ELM configuration and you are connected to all vCenters during execution of the script, all tags are present 3 times. So providing the "-Server" parameter ensures that the correct tag on the correct vCenter is being used.

The issue

So far so good right? Well yes it is. The Tag cmdlets are very easy to use and have improved in performance significantly since PowerCLI 11.3. But what when you are receiving weird errors during execution of the Tag cmdlets? Well the weird issue we encountered resulted in the following behaviour:

PS C:\WINDOWS\system32> Get-VM -Tag "Customer"
Get-VM : 27-11-2019 16:06:57 Get-VM Value cannot be null.
Parameter name: collection
At line:1 char:1
+ Get-VM -Tag "Customer"
+ ~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Get-VM], VimException
    + FullyQualifiedErrorId : Core_BaseCmdlet_UnknownError,VMware.VimAutomation.ViCore.Cmdlets.Commands.GetVM
PS C:\WINDOWS\system32> Get-Tag
Get-Tag : 27-11-2019 16:07:35 Get-Tag vSphere single sign-on failed for connection'/VIServer=cs\bryaneadmin@vcenter:443/' during a previous operation. The current operation requires such single sign-on and therefore failed. Future operations which require single sign-on on this connection will fail. The underlying cause was available in the error message which initially reported the single sign-on failure.
At line:1 char:1
+ Get-Tag
+ ~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Get-Tag], CisException
    + FullyQualifiedErrorId : VMware.VimAutomation.ViCore.Impl.V1.Service.Tagging.Cis.TaggingServiceCisImpl.GetTag.Error,VMware.VimAutomation.ViCore.Cmdlets.Commands.Tagging.GetTag
PS C:\WINDOWS\system32> Get-TagAssignment
Get-TagAssignment : 27-11-2019 16:06:19 Get-TagAssignment vSphere single sign-on failed for connection'/VIServer=cs\bryaneadmin@vcenter:443/' during a previous operation. The current operation requires such single sign-on and therefore failed. Future operations which require single sign-on on this connection will fail. The underlying cause was available in the error message which initially reported the single sign-on failure.
At line:1 char:1
+ Get-TagAssignment
+ ~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Get-TagAssignment], CisException
    + FullyQualifiedErrorId : VMware.VimAutomation.ViCore.Impl.V1.Service.Tagging.Cis.TaggingServiceCisImpl.GetTagAssignment.Error,VMware.VimAutomation.ViCore.Cmdlets.Commands.Tagging.GetTagAssignment
PS C:\WINDOWS\system32> Get-Tag
Get-Tag : 27-11-2019 16:06:14 Get-Tag vSphere single sign-on failed for connection 'https://psc01:7444/sts/STSService/vsphere.local'. Future operations which require single sign-on on this connection will fail. The underlying cause was: Could not establish trust relationship for the SSL/TLS secure channel with authority 'psc01:7444'.
At line:1 char:1
+ Get-Tag
+ ~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Get-Tag], CisException
    + FullyQualifiedErrorId : VMware.VimAutomation.ViCore.Impl.V1.Service.Tagging.Cis.TaggingServiceCisImpl.GetTag.Error,VMware.VimAutomation.ViCore.Cmdlets.Commands.Tagging.GetTag

These errors prevented us from using any of the below “Tag” cmdlets:

PS H:\> Get-Command | ?{$_.Name -like "*tag*"}

CommandType     Name                                               Version    Source
-----------     ----                                               -------    ------
Cmdlet          Get-Tag                                            11.5.0.... VMware.VimAutomation.Core
Cmdlet          Get-TagAssignment                                  11.5.0.... VMware.VimAutomation.Core
Cmdlet          Get-TagCategory                                    11.5.0.... VMware.VimAutomation.Core
Cmdlet          New-Tag                                            11.5.0.... VMware.VimAutomation.Core
Cmdlet          New-TagAssignment                                  11.5.0.... VMware.VimAutomation.Core
Cmdlet          New-TagCategory                                    11.5.0.... VMware.VimAutomation.Core
Cmdlet          Remove-Tag                                         11.5.0.... VMware.VimAutomation.Core
Cmdlet          Remove-TagAssignment                               11.5.0.... VMware.VimAutomation.Core
Cmdlet          Remove-TagCategory                                 11.5.0.... VMware.VimAutomation.Core
Cmdlet          Set-Tag                                            11.5.0.... VMware.VimAutomation.Core
Cmdlet          Set-TagCategory                                    11.5.0.... VMware.VimAutomation.Core

For the sake of documentation below are some details from the environment:

  • VMware vCenter 6.5 U1e build 7515524 was used in the beginning when this issue started occuring.
  • VMware vCenter 6.7 U2c build 14070654 was used once we upgraded the VMware vCenter servers.
  • PowerCLI versions 6.5.0 R1 (which worked), PowerCLI 10.0.0, 10.1.1.8827524 and multiple version 11’s. The latest version I’ve been using is: 11.5.14912921.
  • vCenter PSC versions 6.5 U1e build 7515524 and 6.7 U2c build 14070654 have been used.
  • We are using an external Platform Services Controller (PSC) with three attached VMware vCenter servers in Enhanced Linked Mode. All vCenter servers are having the same issue.
  • The vCenter Servers have been upgraded since version 5 (Windows) to this VCSA version during its lifetime.

And so to be clear to everybody that is reading this blogpost. All services either external or internal facing (between vCenter Server and PSC) are working without issues. The only issue we are having is the one I described above.

Troubleshooting the issue

Looking at the error messages that Powershell provides, it seems that we cannot create a SSO session with the vCenter server. Let’s test that:

PS H:\> Connect-VIServer vcenter -Verbose
VERBOSE: Attempting to connect using SSPI 
VERBOSE: Reversely resolved 'vcenter' to 'vcenter'
VERBOSE: SSPI Kerberos: Acquired credentials for user 'VSPHERE.LOCAL\bryaneadmin'
VERBOSE: SSPI Kerberos: Successful call to InitializeSecurityContext for target 'host/vcenter'
VERBOSE: Connected successfully using SSPI

Name                           Port  User                          
----                           ----  ----                          
vcenter                        443   VSPHERE.LOCAL\bryaneadmin                

PS C:\WINDOWS\system32> $global:defaultviservers

Name                           Port  User
----                           ----  ----
vcenter                        443   VSPHERE.LOCAL\bryaneadmin

PS H:\> $global:DefaultVIServer | Select *

IsConnected   : True
Id            : /VIServer=bryaneadmin@vcenter:443/
ServiceUri    : https://vcenter/sdk
SessionSecret : "c1a06842e202030asdasd1ab2916a07b050a908"
Name          : vcenter
Port          : 443
SessionId     : "c1a06842e25334a549de9ab290c16a07b050a908"
User          : bryaneadmin
Uid           : /VIServer=bryaneadmin@vcenter:443/
Version       : 6.7.0
Build         : 14070654
ProductLine   : vpx
InstanceUuid  : ca744d1f-797b-43b4-927b-02030asdasd1
RefCount      : 1
ExtensionData : VMware.Vim.ServiceInstance

PS H:\> Get-Datacenter

Name
----
DC1
DC2

In the above code snippet we’ve succesfully connected to the vCenter server on port 443 and checked the active connections. That seems to work. Port 443 is also the only requirement to connect to a vCenter Server. Running a command also works just fine. So we can connect to the vCenter server, that isn’t the problem. Next up, are the PowerCLI versions interoperable with the vCenter versions?:

Interoperability matrix for PowerCLI and VMware vCenter

Looking at the above interoperability matrix it seems that we are also good to go on this part, everything is supported. Next up I checked a couple of other things among the following:

  • vCenter Server firewall rules – not present
  • Jumphost firewall rules – not present
  • Between the jumphost and the vCenter Server firewall rules – present (ofcourse)
    • All vCenter Server 6.7 required ports were accessible.

VMware Support case

At this point I actually raised a support ticket with VMware. This ticket actually lasted for almost 6 months after which it got closed without a solution. I discussed the messages that regard port “7444” with VMware in length. But the conclusion the VMware engineer made was that it was due to the fact that we were running a mismatching version between the PSC and vCenter Server (This was true in the beginning). He also mentioned that we shouldn’t be encountering this error because port TCP/7444 wasn’t being used anymore on the front-end. Starting from vCenter Server 6 the SSO Lookup Service moved to port 443 instead of 7444 through the HTTP Reverse Proxy service. Which means it’s only used for internal communications. On version 6.5 it is being used for backwards compatibility since the PSC can support vCenter Server 5.5. The last recommendation I got from VMware was to re-deploy a new PSC and repoint our vCenter Servers. Since this is quite rigorous, I decided I wasn’t going to do this.

Troubleshooting session #2

At this point we moved our focus from this case since on some stations we were able to use the Tag cmdlets, and on other stations we couldn’t. Since most scripting stations we’re able to assign Tags on virtual machines we decided to let it rest. Until I noticed this on a new scripting server I build for our environment. At this point I re-opened my own troubleshooting sessions and decided something else.

Since the error messages are mostly regarding a SSO session on port 7444, I decided to use Wireshark to try and see what the Powershell session was actually doing. This resulted in the following:

Wireshark snippet for Tag cmdlet troubleshooting

In the above snippet we can obviously see that something isn’t right. It actually tries to connect on port 7444 for the Lookup Service, which it shouldn’t since it’s not used anymore. Just to double check myself I tried this query against a working good vCenter Server and it’s completely empty. So we now know that this should normally not happen. This also means that if this is a behaviour you can detect for your vCenter Server, you should continue reading!

At this point it was clear to me that our vCenter Server environment isn’t completely without issues anymore. I decided to try and open up port 7444 and to reconnect and try the Tag cmdlets once more. And guess what, it started to work, sort off. I was one step further, at this point the Powershell window threw the following error at me:

Connect-vIServer : Connect-VIServer Error: Invalid server certificate. Use Set-PowerCLIConfiguration to set the value for the InvalidCertificateAction option to Prompt if you’d like to connect once or to add a permanent exception for this server.

Come again? We have a fully working signed certificate, which isn’t expired and it’s signed by an external/public CA. After doublechecking this (with echo | openssl s_client -connect localhost:443) I figured maybe it is failing on the certificate that is present on port 7444, since we were able to connect before we opened up port 7444. You can actually browse to the Lookup Service (e.g. https://vc01.vcloudvision.lab:7444/lookupservice/mob) and check what certificate is being used or use echo | openssl s_client -connect localhost:7444. After doing this I noticed that there was a certificate named “ssoserver”. This certificate is an internal certificate, so why am I getting errors on this?

Well since this vCenter Server environment got upgraded all the way from vCenter Server 5.5 to the current vCenter Server 6.7 it seems that the internal migration for the SSO services (specifically the SSL Trust Anchors) from back in the day (VCSA 6 to VCSA 6.5) didn’t completely go well.

The solution part 1

Now if you have a healthy vCenter Server environment apart from this minor issue, you can do the following to enable your colleagues to be able to use the Tag cmdlets again:

  1. Open up port TCP/7444 from your jumphost/scripting host to your PSC, or vCenter Server if you are seeing these issues when using the Embedded PSC.
  2. Use Set-PowerCLIConfiguration -InvalidCertificateAction Ignore to ignore the ssoserver “invalid” certificate that your Lookup Service is using at the moment.

After this you will be able to use and do everything you should be able to do with PowerCLI.

The Solution part 2

But if you also want to fix the real underlying issue, you are going to have to do a couple of things. At first you will have to make sure you are actually experiencing the same issue as above. You can do this by following the below step-by-step guide:

  1. Are you receiving the same error messages when using PowerCLI Tag cmdlets like I mentioned above? Continue to step 2.
  2. Are you still using port TCP/7444 (check with Wireshark)? Continue to step 3.
  3. If you browse to:
    1. https://vc01.fqdn:7444/sts/STSService?wsdl or;
    2. https://vc01.fqdn/sso-adminserver/sdk/vsphere.local (The default SSO admin URL, can also be found with Get-AdvancedSetting -Entity $global:DefaultVIServer -Name 'config.vpxd.sso.admin.uri' | select -ExpandProperty Value) or with (/usr/lib/vmware-vmafd/bin/vmafd-cli get-ls-location --server-name localhost) or;
    3. https://vc01.fqdn:7444/lookupservice/sdk

And receive anything else than the current correct configured SSL certificate like below:

OK Certificate chain

You will probably have the “ssoserver” certificate showing. If this is true, you can safely assume that the SSL Trust Anchors have not been properly migrated over during the migrations in the past. The last part you need to do to check this is executing the following command on your VCSA:

/usr/lib/vmidentity/tools/scripts/lstool.py list --url https://localhost/lookupservice/sdk --no-check-cert > /tmp/ssotroubleshooting.txt

OR

If you want to do this through the GUI use the following URL and login with “administrator@vsphere.local”:

https://vc01.fqdn/lookupservice/mob?moid=ServiceRegistration&method=List

In the “filterCriteria” field you need to delete everything so that it displays the following and press “Invoke”:

<filterCriteria>
</filterCriteria>

Both of these ways will list the sslTrust Anchors for all the registered Lookup services on the PSC/vCenter server. You will have to search through the file or through the GUI on the following two Endpoint types:

Endpoint type: com.vmware.cis.cs.identity.sso
anyURI/URL: https://vc01.vcloudvision.lab/sts/STSService/vsphere.local

Endpoint type: com.vmware.cis.cs.identity.admin
anyURI/URL: https://vc01.vcloudvision.lab/sso-adminserver/sdk/vsphere.local <-- like before in the post!

If any of these two Endpoint types contain the port TCP/7444 in the URI/URL, and the registration is still valid (as in it’s being used, look at the ownerId field), the legacy SSO endpoints are not correctly configured anymore. If they are no longer valid because of outdated or decomissioned external PSC servers, you can clean those entries up with:

To check these entries:
/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showservers -h localhost -u "administrator"

To unregister a PSC/vCenter:
cmsso-util unregister --node-pnid psc01.vcloudvision.lab@vsphere.local --username administrator@vsphere.local --passwd password

If this doesn't clean up the entry, use the following to force it:
/usr/lib/vmware-vmdir/bin/vdcleavefed -h psc01.vcloudvision.lab -u administrator

Just remember you have some downtime if you clean this up because services get restarted.

So if you don’t have these outdated entries, what do we do now? Well the Lookup Service has to be re-registered. This can be done through a couple of ways. I will try to describe these below:

  1. Open a case with VMware GSS. They have a script called “lsdoctor-master” which corrects this issue. This is also the safest solution. Tampering the Lookup Service registrations might destroy your vCenter Server. Make sure you have a working backup before you try anything!
  2. While looking at solutions for my use case I also found this post on Reddit. By coincidence I looked into the comments and found that the user theVelement posted a couple of links for a script he wrote that leverages the ls_update_certs.py script on the vCenter Server and updates the endpoints with the corrent sslTrust Anchors.As far as I can see he made these scripts, which can both be found HERE for bash or HERE for PowerShell. I ran these on my test environment and they seem to work. Remember, this is not anything that is official from VMware so you should use this at your own risk. Using the script is very easy. Just run “–help” to find all arguments and run “-cmle” to test current entries.
Running the script to show current entries

So once you have done this you can easily run the script to fix the entries. Just run “-fml” and enter the “administrator@vsphere.local” credentials, the SHA thumbprint that needs to be fixed and simply press Enter. That would look like something as below:

Running the script to fix an entry
  1. You can also do this by hand, which is a lot of work. All of these steps can be found in the following VMware KB. I actually recommend trying the script above in step 2 first, simply because it looks like it works, and it actually uses the same process, but done in a couple of minutes. But like I said before, the script is made by the community and not supported, the KB is official.

So once you’ve fixed the sslTrust Anchors, you should return to doing a couple of checks like earlier in this blogpost:

  • Can you now use the Tag cmdlets without any issues (while port TCP/7444) is closed?
  • If you run the following command, are there any entries for the “com.vmware.cis.cs.identity.sso” or “com.vmware.cis.cs.identity.admin” with the 7444 port (it should not!).
/usr/lib/vmidentity/tools/scripts/lstool.py list --url https://localhost/lookupservice/sdk --no-check-cert > /tmp/ssotroubleshooting.txt
  • Browse to https://vc01.vcloudvision.lab:7444/lookupservice/mob and check what certificate is being used or use echo | openssl s_client -connect localhost:7444 to do the same. If the “ssoserver” certificate is still there it’s not good. It should look something like the certificate below:
Correct certificate

Concluding

So to conclude this blogpost. We started off with a vCenter Server with no real issues except the problems with Tag cmdlets from scripting servers that did not have an open connection to the vCenter Server on the legacy Lookup Service port 7444. During the long troubleshooting period I took for this case it turned out there was an underlying issue with incorrectly migrated sso sslTrust anchors/registrations during previous vCenter Server upgrades.

So the next thing we did was provide a quick and dirty work-around to get you guys going again by allowing the legacy port in the firewall and accepting the internal ssoserver certificate on your PowerShell sessions. To really fix the underlying issue with SSO you can either check out this blogpost and try it out or/and open up a case with VMware. Like I said before tampering with these registrations might destroy your vCenter Server so be careful!

Thank you for reading, stay tuned for future blogposts!


Bryan van Eeden

Bryan is an ambitious and seasoned IT professional with almost a decade of experience in designing, building and operating complex (virtual) IT environments. In his current role he tackles customers, complex issues and design questions on a daily basis. Bryan holds several certifications such as VCIX-DCV, VCAP-DCA, VCAP-DCD, V(T)SP and vSAN and vCloud Specialist badges.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *