Introduction

Recently I have been upgrading one of my test NSX-T labs. In this particular lab environment the scenario was upgrading NSX-T 3.1.2.1.0 to 3.2.3.1.0. NSX-T environments are upgraded in phases and the first (actually second) phase is upgrading the NSX Edge Clusters and NSX Edges inside these clusters. Doing the first upgrade on the first Edge failed after a while, this blogpost will describe the fix for this.

Troubleshooting

The error message occured after the NSX-T upgrade coordinator wanted to do the OS Switchover (Step 5 in the automatic upgrade: See upgrade article). Depending on your setup at this point the Edge will be non functional (also in maintenance mode) and in our specific setup it meant that all North-South (and NAT) functionality was impaired. Would this be a production environment, this would mean serious impact, so it’s good to know how to quickly fix this.

The error message the NSX-T Upgrade Coordinator provided me with the following error message:

Edge 3.2.3.1.0.22104592/Edge/nub/VMware-NSX-edge-3.2.3.1.0.22104642.nub switch OS task failed on edge TransportNode c65ecb65-a028-4378-b1f4-80040cd9d175: clientType EDGE , target edge fabric node id c65ecb65-a028-4378-b1f4-80040cd9d175, return status switch_os execution failed with msg: An unexpected exception occurred: CommandFailedError: Command ['chroot', '/os_bak', '/opt/vmware/nsx-edge/bin/config.py', '--update-only'] returned non-zero code 1: b'lspci: Unable to load libkmod resources: error -12\nlspci: Unable to load libkmod resources: error -12\nlspci: Unable to load libkmod resources: error -12\nlspci: Unable to load libkmod resources: error -12\nlspci: Unable to load libkmod resources: error -12\nSystem has not been booted with systemd as init system (PID 1). Can\'t operate.\nFailed to connect to bus: Host is down\nERROR: Unable to get maintenance mode information\nNsxRpcClient encountered an error: [Errno 2] No such file or directory\nWARNING: Exception reading InbandMgmtInterfaceMsg from nestdb, Command \'[\'/opt/vmware/nsx-nestdb/bin/nestdb-cli\', \'--json\', \'--cmd\', \'get\', \'InbandMgmtInterfaceMsg\']\' returned non-zero exit status 1.\nERROR: NSX Edge configuration has failed. 1G hugepage support required\n/opt/vmware/nsx-edge/bin/config.py:1688: DeprecationWarning: The \'warn\' method is deprecated, use \'warning\' instead\n cfg_logger.warn("Exception reading InbandMgmtInterfaceMsg from nestdb, %s", e)\n' .

It’s not really formatted nicely, but you can make it one specific message stating that “NSX Edge configuration has failed. 1G hugepage support required”. Looking this up provided me with the following KB87244. It seems that the reason this is happening is because we are running this environment on Intel Xeon E5-2640 Sandy Beach CPU’s with EVC in the cluster. These CPU’s do support 1GB Large Page functionality which isrequired as is stated in the NSX Edge VM CPU Requirements document here. The 1GB Large Page support is a useful feature that can provide the VM’s with a larger TLB which reduces TLB misses. In case of VNF situations this can benefit from the performance improvement.

NSX-T Edge Upgrade failure
NSX-T Edge Upgrade failure

Solution

Luckily there is an easy fix. Just Poweroff the VM and do the following:

  1. Shutdown the NSX Edge VM.
  2. Edit Settings -> VM Options -> Advanced -> Configuration Parameters -> Edit Configuration.
  3. Add the followng values with the Add Configuration Params button:
    • Name: featMask.vm.cpuid.PDPE1GB
    • Value: Val:1
NSX-T Edge VM advanced parameter for 1G Huge pages.
NSX-T Edge VM advanced parameter for 1G Huge pages.
  1. Press Save and Power On the NSX-T Edge VM again.

If you are having issues starting your VM and is vCenter Server giving you the following error messages:

Feature '1 GB pages (PDPE1GB)' was absent, but must be present. Failed to start the virtual machine. Module FeatureCompatLate power on failed.

You can fix this by disabled EVC, or raising the EVC level on the cluster. This was the missing piece for me in this environment which was not mentioned in the KB. If even after these two settings you cannot start or complete the upgrade, I’ve seen that enabling the following setting on the NSX Edge VM also helps:

  1. Go to the NSX Edge.
  2. Edit Settings -> CPU -> Hardware virtualization -> and enable ‘Expose hardware assisted virtualization to the guest OS’.
  3. Press Save and now try and start your NSX Edge VM again.

After all of this you should be able to start your NSX-T Edge VM’s and retry the NSX-T Edge Upgrade procedure again.


Bryan van Eeden

Bryan is an ambitious and seasoned IT professional with almost a decade of experience in designing, building and operating complex (virtual) IT environments. In his current role he tackles customers, complex issues and design questions on a daily basis. Bryan holds several certifications such as VCIX-DCV, VCAP-DCA, VCAP-DCD, V(T)SP and vSAN and vCloud Specialist badges.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *