Introduction

Recently I have been upgrading one of my test environments from NSX 3.1.2 to 3.2.1 and have come across a couple of issues. This time it seems like there is one Tier 1 GW in my environment that is not connecting to both Edge Nodes in a Edge Cluster. We will talk how to fix that in this blogpost.

Troubleshooting

First we will have a look at the issue below. As I’ve said we have one Tier 1 Gateway only connected to one of two Edge Nodes in the Edge Cluster. This Tier 1 Gateway is created by VMware Cloud Director (VCD) and not directly in the NSX Back-End. The other two Tier 1 gateway’s are also created by VCD and connected to an Edge Cluster by VCD.

NSX-T Tier 1 Gateway Edge Cluster configuration.
NSX-T Tier 1 Gateway Edge Cluster configuration.

As we can see below, both Tier 1 Gateways are connected to the same Edge Cluster called “Edge-Cluster-T1”. Going to System -> Fabric -> Nodes -> Edge Clusters we can see that there are 2 Edge Transport Nodes connected to this Edge Cluster.

NSX Edge Cluster Edge Nodes
NSX Edge Cluster Edge Nodes

Just to call out all information. In this case “Edge-Cluster-T1” had 1 Edge Node when both of these Tier 1 Gateways were created. I added the 4th Edge node (2nd in this Edge Cluster) later on. When we go back to the Edge Transport Nodes tab, we can click on both of the beforementioned Edge Nodes and see what Logical Routers (Tier 1 Gateways) are connected under the Related tab. Below you can see the contents for Edge Node 3:

Edge Node 3 related Logical Routers.
Edge Node 3 related Logical Routers.

And for Edge Node 4:

Edge Node 4 related Logical Routers.
Edge Node 4 related Logical Routers.

As you can see, the Logical Router called “BvE-VCD-NSX-T-Tier-1Gateway” is missing from the 4th Edge Node. Going back to the Network tab, selecting the Tier 1 Gateway and clicking on the Edges we can see the difference between the two:

If we login to the CLI for the Edge Node 4 we can also see that it is missing:

edge-tn-04> get logical-routers
Fri Sep 08 2023 UTC 13:11:10.536
Logical Router
UUID                                   VRF    LR-ID  Name                              Type                        Ports   Neighbors      
736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      4       18/5000        
3633938f-bec3-424a-8150-40a9e35d7555   3      3073   DR-BvE-VCD-NSX-T-Tier-1Gateway2   DISTRIBUTED_ROUTER_TIER1    4       0/50000        
57278857-33b6-48f5-ae6d-2a819bb12886   4      1028   SR-BvE-VCD-NSX-T-Tier-1Gateway2   SERVICE_ROUTER_TIER1        5       2/50000              
ab03f791-5754-4e9e-8bfc-19afaf2119f6   6      5121   SR-BvE-VCD-NSX-T-Tier1Gateway3-   SERVICE_ROUTER_TIER1        5       2/50000

Now the question is, how do get the Standby node working and not only the Active node.

Solution

It seems that there is just something stuck in the NSX environment which holds the creation of the Standby node. To fix this, you can thankfully do something very easy. You can move the Tier 1 Gateway to another Edge Cluster and after this it should have an Active and Standby node. Once this is done, you can move it again back to the original cluster. Take note that this can have production impact depending on your configuration and environment. In my environment there was a 1 ping outage switchting the Edge Clusters. In the end this looked like the following in my environment:

edge-tn-04> get logical-routers
Fri Sep 08 2023 UTC 13:16:44.091
Logical Router
UUID                                   VRF    LR-ID  Name                              Type                        Ports   Neighbors      
736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      4       20/5000               
3633938f-bec3-424a-8150-40a9e35d7555   3      3073   DR-BvE-VCD-NSX-T-Tier-1Gateway2   DISTRIBUTED_ROUTER_TIER1    4       0/50000        
57278857-33b6-48f5-ae6d-2a819bb12886   4      1028   SR-BvE-VCD-NSX-T-Tier-1Gateway2   SERVICE_ROUTER_TIER1        5       2/50000               
ab03f791-5754-4e9e-8bfc-19afaf2119f6   6      5121   SR-BvE-VCD-NSX-T-Tier1Gateway3-   SERVICE_ROUTER_TIER1        5       2/50000        
322136ec-fcaf-4737-8c0c-ce2b949fa0c7   7      2050   SR-BvE-VCD-NSX-T-Tier-1Gateway    SERVICE_ROUTER_TIER1        5       2/50000        
e60d5ce5-378e-4dfe-9b5b-8b35ab1d81c7   8      2049   DR-BvE-VCD-NSX-T-Tier-1Gateway    DISTRIBUTED_ROUTER_TIER1    5       0/50000

As you can see the DR and SR have been created on this specific Edge Node. Now if we go back again to the Tier 1 Gateway we can see that both the Active and Standby nodes have been created successfully.

NSX Tier 1 Gateway 2 Edge Nodes
NSX Tier 1 Gateway 2 Edge Nodes

I hope this helped.


Bryan van Eeden

Bryan is an ambitious and seasoned IT professional with almost a decade of experience in designing, building and operating complex (virtual) IT environments. In his current role he tackles customers, complex issues and design questions on a daily basis. Bryan holds several certifications such as VCIX-DCV, VCAP-DCA, VCAP-DCD, V(T)SP and vSAN and vCloud Specialist badges.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *