Introduction
Earlier this week I have had the opportunity to listen to the new enhancements announced today for the VMware products VMware Cloud Foundation (VCF), vSphere and NSX! In this blogpost I will highlight the most important parts that apply to my day to day work and and view my own opinion on these new updates in the products. I did something similar a long time ago with the release of vSphere 7 back in 2020!
Today will mark the first time in VMware (Now Broadcom) history that all of the product lines in the VCF Bill of Material (BOM) release at the same time. It’s almost a company wide and a massive release with a new version for VCF, vSphere, vSAN, NSX, SDDC Manager and the Aria Suite. This is pretty significant, because in earlier days not every product was well aligned to the others in the BOM. Now without further delay, let’s start!
If you want to skip to certain parts, please feel free to use the below index:
VMware Cloud Foundation (VCF) 5.2
So the first big thing in this release is the option to import ‘brownfield’ existing environments into SDDC Manager to provide SDDC based automation on these environments with VCF Import:
There are two possibilities to use this:
- Deploy SDDC Manager as an OVA, deploy it and convert an existing cluster into a Management Domain.
- Import an existing vSphere Infrastructure into an existing converted VMware Cloud Foundation instance as a VI workload domain.
- This import will be done by a command line (CLI) tool which is a contained python script that holds everything you need.
There are two things to take into account because not every topology is supported:
- Supported Topology
- vSphere Infrastructure environments with FiberChannel, NFS and vSAN storage.
- Not Supported Topology
- vSphere Infrastructure with NSX environments
- LACP configurations in a vSphere Infrastructure environment.
The ‘big’ thing I found was that starting from VCF 5.2 the SDDC Manager is no longer connected to the entire BOM while talking about Lifecycle Management fro the VCF environment. This update will make everything a lot more flexible and creates a possiblity to independently update SDDC Manager as new releases become available.
Last but not least, VMware has expanded their Edge capabilities in VCF by introducing a new SKU (smaller increments 16 up to 256 cores per Edge) compared to the previous expensive SKU. You can deploy Compute only nodes, but also deploy small vSAN clusters on the Edge to take these new licences to deploy what you need on the Edge in a more scalable way. Several new topologies are available now to complement most of the use cases that are out there for the Edge.
And to round up the new VCF release there are a couple more smaller (for me) updates such as:
- Support for Microsoft Entra ID for SSO authentication next to OKTA.
- Deploying Async Patched Domains (Workload/MGMT)
- vSphere Live Patching compatibility (more on that in the next chapter)
- Offline Depot Local Patch Repository to create a local repository.
vSphere 8.0 U3
vSphere IaaS Control Plane
Let’s start this section of by saying that there is a new product name change from vSphere with Tanzu to vSphere IaaS Control Plane! So if you come across anything that mentions vSphere IaaS Control Plane, this is related to everything that is connected to Tanzu from now on. Discussing the Tanzu part, it seems that VMware is also decoupling the TKG service from the vCenter Server and implementing it as a Core Supervisor Service to provide the ability to deliver asynchronously releases for the TKG service. Just as with the SDDC Manager, this allows you to upgrade it independently from vCenter Server upgrades in your infrastructure and provides you with a bit more flexibility and a way to deliver new Kubernetes versions more quickly.
Next up is Autoscaling for Kubernetes clusters! With this release it is possible to now autoscale worker nodes, scale up when required, scale down when the environment is underutilized. All can be done with the ‘cluster-autoscaler’ package. A note to know; the minimum version required is 1.25.
vSAN stretched Cluster is now also supported for vSphere IaaS Control Plane (vSphere with Tanzu)! This was a much requested feature for many customers. With this new options you can use vSAN Stretched clusters to host your (Supervisor) Control Plane VM’s.
Other smaller updates to vSphere IaaS Control Plan:
- Automated Supervisor certificate rotation
- Extremely useful to not have to do this manually anymore! An alarm will be raised once a certificate will reach the end
- VM Service – VM Backup and Restore
- You can now backup and restore the User Namespace (or entire VM) with VADP integration.
- VM Service – VM Class Expanded Configuration
- You can now configure a VM Class with all VM Hardware that is available to regular VM’s!
- Local Consumption Interface (LCI)
- Admin UI for VMs and TKG Clusters with a completely new UI. This UI also creates automatic YAML content and you can add GPU’s.
Lifecycle Management
Lifecycle management is something I am particularly involved with as a daily user of extremely large vSphere environments. If new functionality can reduce the effort required to patch our environments and do it a lot faster, I am all ears! VMware has been actively trying to increase their effors on their side on this by releasing a simplified Lifecycle Manager (streamlined patch baselines for a cluster), vCenter Reduced Downtime Update (Quick with almost no downtime vCenter upgrades) and vSphere Configuration Profiles( to hold configurations). In this release the following new parts have been added:
- vSphere Live Patch
vSphere Live Patch is released with vSphere ESXi 8.0 U3 today and it provides us with a faster and less disruptive way of (security) patching our ESXi hosts by doing something different than regular ESXi updates do. An ESXi host can enter a new maintenance mode which is called ‘Partial Maintenance Mode’. Once in the new maintenance mode, a new ‘Mount Revision’ (or ESXi system image) with the files that need to be updates will be loaded and patched. At this point the VM’s take advantage of the Fast-Suspend-Resume (FSR) technique to take use the new ‘Mount Revision’. This is completely seamless and non-disruptive. FSR is also used during VM reconfigures, add memory, add CPU or change hardware so it’s not something new. Once this is done, the ESXi host will exit the ‘Partial Maintenance Mode’ and continue its operation. This is also true for VM’s enabled with vGPU’s!
The following VM’s are not compatible with FSR:
- Fault Tolerance VM’s.
- VM’s using Direct Path IO devices.
A host in ‘Partial Maintenance Mode’ can still run VM’s, however it disallows migrations to and from a host and VM’s cannot be created on the host. So looking at this, this new state is perfect and does not provide a real drawback. However, you as a user/administrator cannot put an ESXi host in the ‘Partial Maintenance Mode’. This is something done by the Lifecycle Manager. However, you can get the host out of the ‘Partial Maintenance Mode’! Sounds good to me! It’s a small start, but it’s something that can be quite useful in the future.
At the moment this can only patch files in the VM Execution Area (the area in which the VM runs). In the future VMware wants to add more updates and target up to 90%+ of the security related patches. Lifecycle Manager will display what patches are eligible for vSphere Live Patch and which are not.
- Enhanced Image Customization
In this new release there is a bit more flexibility in the way you can customize the images that Lifecylce Manager uses. You can now override/remove vendor Addon’s or keep existing drivers. And you can also remove the ESXi Host Client and VMware Tools entirely out of the images. I’m interested to see where you can get/configure the VMware Tools. Because the centralized VMware Tools once discussed by VMware here is not something that is actively used anymore.
- Complete Topology Support
As discussed before, vCenter Reduced Downtime Upgrade (RDU) is something new released not that long ago. How this works is that you mount an ISO to the vCenter Server, and after this you can use the workflow within the vCenter Server to upgrade the vCenter Server. It automatically creates a new VM, migrates over the DB and does a failover to execute the upgrade. This in return will significantly reduce the downtime to less than 5 minutes. Previously only a couple of vCenter Server topologies were supported for this, today support has been extended to the following vCenter Server Topologies:
- Self-Managed
- vCenter Server that is hosted on hardware within the vCenter.
- Non Self-Managed
- vCenter Server that is managed by another vCenter.
- Enhanced Linked Mode
- Two or more vCenter Servers using the same SSO Domain.
- vCenter High Availability (HA)
- vCenters using the vCenter HA functionality.
Next to this, you can now choose an Automated Switchover with updating your vCenter Server with the RDU feature. Before this could only be done by using the Manual Switchover.
vSphere 8 U3 GPU workload enhancements
In the last update with vSphere 8 U2 there were some enhancements on GPU workloads which have been extended into this release to even make this better. With this release ESXi can truly support multiple different GPU profiles per single GPU! However, the GPU does need to support this. Looking at Nvidia GPU’s this is their own technology. Each vGPU profile has different configuration parameters and therefor not every GPU supports this. This is something I have been wanting for a long time.
Previously you had to cut up each physical GPU in a GPU which could only support 1 vGPU Profile such as 1GB, 2GB, 4GB etc. This means that you get a lot of free unused space on each physical GPU where in fact you could merge them and more efficiently use the GPU.
Starting with this release you can now also use the per GPU Media Engine (to assist in video rendering codecs) in vGPU profiles. You can only assign this to one vGPU Profile. There is also a new Cluster based GPU statistics UI in which you can see all GPU resources used in a cluster.
The next best thing on this section is that DRS is now able to move around VM’s with Passthrough Devices (vGPU enabled VM’s only at this point) to a host that can provide a better DRS Score.
Storage enhancements
This release introduces initial support for vVols Stretched Storage Clusters (vVols SSC). This new feature brings a welcome solution for Metro Cluster (active/active clusters) requirements that want to use vVols as their storage layer. It is only available for one vCenter Server at the moment and will use VASA 6. Pure Storage helped design this feature, but it should be usable on other vVol enabled Storage Arrays.
There is now also an UNMAP feature for vVols, even on NVMe volumes. This means that you don’t have to do anything manually anymore. Next to this VMware has extended their support for Microsoft WSFC on an NVMeoF backend (TCP/FC) vVols including vNVMe storage controller support. This means that once more there is no need for RDMs anymore while using WSFC on a vVOL.
The last nice bit for the storage enhancement is that VMware is adding support for Fabric Performance Impact Notifications (FPIN). This feature notifies vSphere that there is impact or performance degradation within the Storage network and use the healthy paths that are not impacted by link issues.
vSAN enhancements
vSAN Stretched Clusters with the Enhanced Storage Architecture (ESA) is now fully supported while using VCF 5.2. In VCF 5.1 this was not yet supported.
Another big enhancement is that vSAN MAX (Disaggregated vSAN) is available to VCF 5.2. You can now provide storage only nodes or compute nodes to the VCF vSAN workload domains to expand the capabilities of your entire infrastructure. This is particularly useful if the case where you have a full cluster on one of the two resources, and only want to add for example storage. It can be financially beneficial to increase your resources in infrastructure like this.
Like many of you might know, vSAN ESA uses a new Snapshot architecture with a highly efficient B-Tree lookup table instead of the regular chained architecture. In this release there has been additional enhancements to this. Because of these high performing snapshots, there are several Data Protection use-cases which now can be done. A fresh new UI is created to help the admins use Local Protection or Cloud Protection for VM’s, all from within vSAN. Pretty awesome right? You can even make them Immutable so that you are secured for ransomware attacks. A neat little extra is that you can also clone VM’s from these High Performing snapshots.
The Data Protection functionality is completely embedded in the vCenter Server UI and provides the user with Policy-based Protection Groups in which you can use naming conventions to automatically include VM’s into certain groups and therefor settings.
I think that one of the best parts of this new feature in vSAN ESA is that Snapshots are no longer really coupled together with VM objects. Once you delete a VM, you can still recover from its snapshot, eventhough it is not in vCenter Server anymore. However, I would’ve found this quite strange if this was not the case, since this is targeting the backup vendors out there, which obviously have working methods of restores once a VM is removed. Currently you can have 200 snapshots per VM and something to account for is that snapshotting will automatically stop once 70% of the vSAN Storage capacity is use. Obviously more and more example usecases can be made like the one displayed below where a Development environment can easily test and clone VM’s from snapshots.
Small other notable enhancements
And last but not least some smaller items from this release that might be significant for you:
- Dual DPU Support within Lifecycle Manager. You can remediate multiple DPU’s at the same time. Do take not that this is not compatible with vSphere Live Patching.
- vSphere Cluster Services enhancements.
- The vCLS VM’s will be instantiated a lot quicker.
- There were only be two VM’s per cluster.
- The deployment is no longer an OVA!
- They will not really be VM’s anymore.
- It runs directly in ESXi and in memory. There is no storage required anymore.
- The new version is called ‘Embedded vCLS’, the old version is just ‘vCLS’.
- Intel Xeon Max Series CPU’s are now supported.
- PingFederate is a newly supported provider in the vSphere Identity Federation feature.
- Quickly manage TLS Ciphers through PowerCLI or API on ESXi hosts.
- Limit number of ESXi hosts sending UNMAP commands to the Storage Array with an advanced parameter.
- Reduce time to inflate thin to Easy Zeroed Disks with a new API which is 10 times faster than before!
- Support for File Volumes in HCI Mesh in a single vCenter Server topology.
- vSAN File Services in a ESA cluster can now support op to 250 File shares. This is useful for the customers that use vSAN ESA with cloud native applications.
- Improved Intelligence to Augment Device Health information in vSAN ESA environments. Vendor plugins and telemetry data integrates within vSAN Health so that it is more obvious what has happend to the hardware layer in a vSAN cluster.
- The Cluster-Level Storage Performance Troubleshooting tooling has been enhanced to analyse performance bottlenecks on multiple VM’s at the same time to narrow down to see on what specific part of the infrastructure a performance bottleneck is present. This is also backwards compatible to vSAN OSA!
NSX 4.2 enhancements
Next up, NSX! The NSX software has been updates for VCF 5.2 so that you can use a wizard like workflow to onboard VLAN-backed Workloads to NSX Virtual Networks:
This will significantly simplify onboardings for VCF environments.
Next to this, you can now scale the NSX Local and Global Managers to a new level, the X-Large form factor in which you can assign 24 vCPU’s and 96GB RAM to a NSX Manager. This is specifically targeted at large Cloud and/or Enterprise Environments.
Then we have something which I found useful in the new NSX update which is Transport Endpoint (TEP) performance enhancements. Previously in NSX 4.1 and earlier. An Edge TEP was connected to a Host TEP which meant that traffic from one segment was connected to a specific Edge TEP. In NSX 4.2 you can create something that is called a TEP group in which multiple Edge TEP’s can be grouped. This means that one segment can now connect to multiple TEP endpoints within the TEP group. This creates a lot more performance on the TEP’s!
HCX Enhancements
There has also been some enhancements to the HCX Software package. I’d like to point out two important enhancements to me:
There is now something called HCX Assisted vMotion (HAV). Essentially this is a vMotion of VM’s from a source to a destination environment but orchestated by HCX. The benefits for this is that it is really fast, like a regular vMotion and that vGPU VM’s are supported! vSphere 6.0+ up untill the current version is supported for this!
The second large improvement is that the architecture for onboarding/migrating non-vSphere workloads is extremely simplified. Previously there were 4 appliances required AND a vSphere environment on the on-premises site to even be able to migrate resources. Now with the new release it’s all consolidated into one single appliance, and the need to have vSphere on-premises is removed entirely. I can see this being a major thing for us since we onboard customers a lot and using a simplified setup on the customer side can increase the rate at which we can onboard and generate revenue.
Aria Suite enhancements
All Aria Suite products such as Aria Operations, Aria Operations for Logs, VCF Automation all have received an update with a couple of enhancements. This release seems targeted at simplifying the environments. Enhancements such as new console experiences, unified sign-on experiences, improved UI all around the board. In addition to this vSAN MAX is now supported, MGMT packs are redefined and optimized.
It seems that the Skyline Health diagnostics are now also integrated into the Aria Ops Diagnostics dashboard. This is a neat thing. It reduces portals, merges data into a centralized environment. This dashboard seems to include your infrastructure certificates. All connected environment certificates (such as ESXi hosts, vCenter Servers etc) are pulled in and you can view them centrally from the Aria Operations Dashboard.
The last significant part on this for me is that you can now manage your VMware environment license keys from the Aria Operations software. You can use this as a central dashboard in which you enter, edit and assign license keys to ESXi hosts, vCenter Servers, Tanzu, NSX and much more. Since the move to Broadcom you are required to re-license your entire environment with new VMware by Broadcom licenses.
Not only can you manage them, they are also reported back into the tooling. So you can very easily track your license Usage for the entire infrastructure from this portal. This is something that is really been a wish of mine for a long time since Usage Meter 4.x was no longer able to provide you Usage Tracking information. Since 4.X with Usage Meter all data must go through the VMware cloud systems and you would receive a number of used licenses at the end of the month. With this new feature I can track license Usage across my environment, give customers advice on where to reduce their licenses.
To be honest, I am not quite sure where Usage Meter is going if this is already integrated within Aria Operations. We’ll see where that will go on from here!
Closing notes
To close up this blogpost, today marks the release of a massive update across the product lines at VMware. When I started writing this blogpost I honestly did not think it would be this big. I think it’s been a while since we had an update this large to this many products!
Taking home from this release I think for me personally I most like the fact that VMware is extremely simplifying their portfolio of products. Many applications such as SDDC Manager and Tanzu are now decoupled from their previously connected applications which brings in a lot of flexibility. Who wants to update their entire environment, only to find out a new patch is already there. Now you can plan a bit more ahead and be flexible in what updates you want to do at what time.
I think vSphere Live patching can become a very large thing for me. I manage thousands of ESXi hosts on a daily basis and we are usually patching all the time because of (security) updates. You can automate as much as you want but in the end clearing an ESXi host, rebooting it and repopulating it takes time. With a ‘Partial Maintenance Mode’ this takes significantly less time and this can in return reduce the time you need to patch a host, let alone 1000 hosts.
The last vSphere enhancement that we’ve been waiting for for a long time is the support for multiple vGPU profiles across a single physical GPU. I think most that use vGPU profiles in their environment can relate to this being a pain to manage looking at the capacity side of things. I also like that DRS is now able to move the VM’s because of this.
Last but not least, License Management moving to VMware Aria Operations is something I did not really expect, but find it to be a welcome change. Usage Meter is not really transparent anymore and does not give you insight in your license usage. It seems that with the new VMware Aria Operations release I can now once more have insight and track my usage.
I hope you stayed on the blog post up until this point and you have enjoyed my view on the updates that were brought to use today!
0 Comments