Introduction
Recently I have been upgrading one of my test environments from NSX 3.1.2 to 3.2.1 and have come across a couple of issues. This time it seems like there is one Tier 1 GW in my environment that is not connecting to both Edge Nodes in a Edge Cluster. We will talk how to fix that in this blogpost.
Troubleshooting
First we will have a look at the issue below. As I’ve said we have one Tier 1 Gateway only connected to one of two Edge Nodes in the Edge Cluster. This Tier 1 Gateway is created by VMware Cloud Director (VCD) and not directly in the NSX Back-End. The other two Tier 1 gateway’s are also created by VCD and connected to an Edge Cluster by VCD.

As we can see below, both Tier 1 Gateways are connected to the same Edge Cluster called “Edge-Cluster-T1”. Going to System -> Fabric -> Nodes -> Edge Clusters we can see that there are 2 Edge Transport Nodes connected to this Edge Cluster.

Just to call out all information. In this case “Edge-Cluster-T1” had 1 Edge Node when both of these Tier 1 Gateways were created. I added the 4th Edge node (2nd in this Edge Cluster) later on. When we go back to the Edge Transport Nodes tab, we can click on both of the beforementioned Edge Nodes and see what Logical Routers (Tier 1 Gateways) are connected under the Related tab. Below you can see the contents for Edge Node 3:

And for Edge Node 4:

As you can see, the Logical Router called “BvE-VCD-NSX-T-Tier-1Gateway” is missing from the 4th Edge Node. Going back to the Network tab, selecting the Tier 1 Gateway and clicking on the Edges we can see the difference between the two:
If we login to the CLI for the Edge Node 4 we can also see that it is missing:
edge-tn-04> get logical-routers Fri Sep 08 2023 UTC 13:11:10.536 Logical Router UUID VRF LR-ID Name Type Ports Neighbors 736a80e3-23f6-5a2d-81d6-bbefb2786666 0 0 TUNNEL 4 18/5000 3633938f-bec3-424a-8150-40a9e35d7555 3 3073 DR-BvE-VCD-NSX-T-Tier-1Gateway2 DISTRIBUTED_ROUTER_TIER1 4 0/50000 57278857-33b6-48f5-ae6d-2a819bb12886 4 1028 SR-BvE-VCD-NSX-T-Tier-1Gateway2 SERVICE_ROUTER_TIER1 5 2/50000 ab03f791-5754-4e9e-8bfc-19afaf2119f6 6 5121 SR-BvE-VCD-NSX-T-Tier1Gateway3- SERVICE_ROUTER_TIER1 5 2/50000
Now the question is, how do get the Standby node working and not only the Active node.
Solution
It seems that there is just something stuck in the NSX environment which holds the creation of the Standby node. To fix this, you can thankfully do something very easy. You can move the Tier 1 Gateway to another Edge Cluster and after this it should have an Active and Standby node. Once this is done, you can move it again back to the original cluster. Take note that this can have production impact depending on your configuration and environment. In my environment there was a 1 ping outage switchting the Edge Clusters. In the end this looked like the following in my environment:
edge-tn-04> get logical-routers Fri Sep 08 2023 UTC 13:16:44.091 Logical Router UUID VRF LR-ID Name Type Ports Neighbors 736a80e3-23f6-5a2d-81d6-bbefb2786666 0 0 TUNNEL 4 20/5000 3633938f-bec3-424a-8150-40a9e35d7555 3 3073 DR-BvE-VCD-NSX-T-Tier-1Gateway2 DISTRIBUTED_ROUTER_TIER1 4 0/50000 57278857-33b6-48f5-ae6d-2a819bb12886 4 1028 SR-BvE-VCD-NSX-T-Tier-1Gateway2 SERVICE_ROUTER_TIER1 5 2/50000 ab03f791-5754-4e9e-8bfc-19afaf2119f6 6 5121 SR-BvE-VCD-NSX-T-Tier1Gateway3- SERVICE_ROUTER_TIER1 5 2/50000 322136ec-fcaf-4737-8c0c-ce2b949fa0c7 7 2050 SR-BvE-VCD-NSX-T-Tier-1Gateway SERVICE_ROUTER_TIER1 5 2/50000 e60d5ce5-378e-4dfe-9b5b-8b35ab1d81c7 8 2049 DR-BvE-VCD-NSX-T-Tier-1Gateway DISTRIBUTED_ROUTER_TIER1 5 0/50000
As you can see the DR and SR have been created on this specific Edge Node. Now if we go back again to the Tier 1 Gateway we can see that both the Active and Standby nodes have been created successfully.

I hope this helped.
0 Comments