NAT Detection

Cisco SD-WAN solution is designed to run over any kind of WAN transport that is available to the WAN edge devices including all different public networks such as Broadband, 4G/5G, LTE, Business Internet, and so on. This implies that the overlay fabric should be able to form through all flavors of Network Address Translations that these public networks utilize. In practice, any Cisco SD-WAN device may be unknowingly sitting behind one or more NAT devices. In order to discover the public IP addresses/ports allocated by NAT, Cisco SD-WAN devices use the Session Traversal Utilities for NAT (STUN) protocol defined in RFC5389.

Session Traversal Utilities for NAT (STUN) Operations
Figure 1. Session Traversal Utilities for NAT (STUN) Operations

STUN is a client-server protocol that uses a request/response transaction in which a client sends a request to a server, and the server returns a response. As the request (called STUN Binding Request) passes through a NAT, the NAT will modify the source IP address/port of the packet. Therefore, the STUN server will receive the request with the public IP address/port created by the closest NAT device. The STUN server then copies the public address into an XOR-MAPPED- ADDRESS attribute in the STUN Binding response and sends it back to the client. Going back through the NAT, the public address/port in the IP header will be un-NATted back to the private ones, but the public address copy in the body of the STUN response will remain untouched. In this way, the client can learn its IP address allocated by the outermost NAT with respect to the STUN server.

As it is shown in Figure 1, all Cisco SD-WAN devices have an embedded STUN client and the vBond orchestrator acts as a STUN Server. When the initial control communication to vBond takes place, the SD-WAN device performs the STUN operations and discovers its public IP address and port. Once determined, this information is then advertised as part of the TLOC routes to the vSmart controllers and then re-advertised to all other SD-WAN devices.

NAT Types

In a typical production SD-WAN deployment, we would probably have many remote sites connected via many different Internet connections to a centralized data center or a regional hub. In most regions in the world, Internet providers will always use some type of private-public address translation due to a shortage of public IPv4 addresses. Let's look at the NAT classifications according to the STUN protocol and how they can affect whether sites can form connections and communicate directly with each other or not.

Full-Cone NAT

A full-cone is one where all packets from the same internal IP address are mapped to the same NAT IP address. This type of address translation is also known as One-to-One. 

Additionally, external hosts can send packets to the internal host, by sending packets to the mapped NAT IP address.

Full-Cone NAT
Figure 2. Full-Cone NAT

Restricted-Cone NAT

A Restricted-Cone network address translation is also known as Address-Restricted-Cone. It is a network translation technique where all packets from the same internal IP address are mapped to the same NAT IP address. The difference to a Full-Cone is that an external host can send packets to the internal host only if the internal host had previously sent a packet to the IP address of the external destination. It is important to note that once the NAT mapping state is created, the external destination can communicate back to the internal host on any port. 

Restricted-Cone NAT
Figure 3. Restricted-Cone NAT

Port-Restricted-Cone NAT

A Port-Restricted-Cone is similar to the Restricted-Cone address translation, but the restriction includes also port numbers. The difference is that an external destination can send back packets to the internal host only if the internal host had previously sent a packet to this destination on this exact port number. In a typical Cisco IOS/IOS-XE or Cisco ASA configuration, this feature is known as Port Address Translation (PAT).

Port-Restricted-Cone NAT
Figure 4. Port-Restricted-Cone NAT

Symmetric

Symmetric NAT is also known as Port Address Translation (PAT) and is the most restrictive of all other types. It is a network translation technique where all requests from the same internal IP address and port to a specific destination IP address and port, are mapped to a unique NAT IP address and NAT port. Furthermore, only the external destination that received a packet can send packets back to the internal host. In a typical Cisco IOS/IOS-XE or Cisco ASA configuration, this feature is known as Port Address Translation (PAT) with port-randomization.

Symmetric-NAT
Figure 5. Symmetric-NAT

Best-Practices

Although Cisco SD-WAN supports several types of Network Address Translations, to create a full mesh overlay fabric, at least one side of the WAN Edge tunnels is recommended to be able to initiate a connection inbound to the second WAN Edge. This means that at least one side of the tunnel is recommended to have a public IP address or to be behind a Full-Cone (1-to-1). It is also strongly recommended to configure full-cone, or one-to-one address translation at the data centers or regional hub sites so that, regardless of what NAT type is running at the remote sites (restricted-cone, port-restricted cone, or symmetric ), they can send traffic to the hubs without issues. 

NAT combinations between WAN Edge routers
vEdge-1 vEdge-2 IPsec tunnel can form GRE tunnel can form
No-NAT (Public IP) No-NAT (Public IP) YES YES
No-NAT (Public IP) Symmetric YES NO
Full Cone  (One-to-one) Full Cone (One-to-one) YES YES
Full Cone (One-to-one) Restricted-Cone YES NO
Full Cone (One-to-one) Symmetric YES NO
Restricted-Cone Restricted-Cone YES NO
Symmetric  Restricted-Cone NO NO
Symmetric Symmetric NO NO

Symmetric address translation configured at the transport attached to one vEdge requires a full-cone or a public IP on the other vEdge to establish a direct IPsec tunnel between them. Sites that cannot connect directly should be set up in a hub-and-spoke topology so they can reach each other through a regional hub site or data center.

IMPORTANT  Note that for overlay tunnels configured to use GRE encapsulation instead of IPsec, only public IP addressing or one-to-one address translation is supported. Any type of Network Address Translation with port overloading is not supported since GRE packets lack an L4 header. 

TLOC Routes

Once every WAN edge router discovers its private-public translated address and port, it advertises them to the vSmart controller via OMP using the OMP TLOC routes. The vSmart controller then re-advertises this information across the overlay fabric.

An example of two WAN edge router connected through Port-Restricted-Cone
Figure 6. An example of two WAN edge router connected through Port-Restricted-Cone

Lastly, let's see an example of two WAN edge devices connected through a Port-Restricted-Cone. As you can verify in the combination table, they are able to form an IPsec encapsulation tunnel between themselves but if we change the encapsulation type to GRE - the data plane tunnel does not come up. Let's quickly verify that.

These are both TLOCs when the enc is set to ipsec.

---------------------------------------------------
tloc entries for 60.60.60.60
                 lte
                 ipsec
---------------------------------------------------
            RECEIVED FROM:                   
peer            1.1.0.3
status          C,I,R
loss-reason     not set
lost-to-peer    not set
lost-to-path-id not set
    Attributes:
     attribute-type    installed
     encap-key         not set
     encap-proto       0
     encap-spi         256
     encap-auth        sha1-hmac,ah-sha1-hmac
     encap-encrypt     aes256
     public-ip         60.1.1.1
     public-port       12346
     private-ip        192.168.1.2
     private-port      12346
     public-ip         ::
     public-port       0
     private-ip        ::
     private-port      0
     bfd-status        up
     domain-id         not set
     site-id           60
     overlay-id        not set
     preference        0
     tag               not set
     stale             not set
     weight            1
     version           2
    gen-id             0x80000010
     carrier           default
     restrict          0
     groups            ( 0 )
     border             not set
     unknown-attr-len  not set
---------------------------------------------------
tloc entries for 70.70.70.70
                 lte
                 ipsec
---------------------------------------------------
            RECEIVED FROM:                   
peer            1.1.0.3
status          C,I,R
loss-reason     not set
lost-to-peer    not set
lost-to-path-id not set
    Attributes:
     attribute-type    installed
     encap-key         not set
     encap-proto       0
     encap-spi         256
     encap-auth        sha1-hmac,ah-sha1-hmac
     encap-encrypt     aes256
     public-ip         70.1.1.1
     public-port       12426
     private-ip        172.16.1.2
     private-port      12426
     public-ip         ::
     public-port       0
     private-ip        ::
     private-port      0
     bfd-status        up
     domain-id         not set
     site-id           70
     overlay-id        not set
     preference        0
     tag               not set
     stale             not set
     weight            1
     version           2
    gen-id             0x80000014
     carrier           default
     restrict          1
     groups            ( 0 )
     border             not set
     unknown-attr-len  not set

You can clearly see that the BFD session is UP which means that the tunnel is up and running and data plane traffic is able to go through back and forth. Now let's change the encapsulation type to GRE and see what will happen.

vEdge-3(config)# vpn 0
vEdge-3(config-vpn-0)#  interface ge0/0
vEdge-3(config-interface-ge0/0)# tunnel-interface
vEdge-3(config-tunnel-interface)#    encapsulation ?
Possible completions:
  gre  ipsec
vEdge-3(config-tunnel-interface)# encapsulation gre   
vEdge-3(config-tunnel-interface)# commit 
Commit complete.

vEdge-4(config)# vpn 0
vEdge-4(config-vpn-0)#  interface ge0/0
vEdge-4(config-interface-ge0/0)#   tunnel-interface
vEdge-4(config-tunnel-interface)#    encapsulation gre
vEdge-4(config-tunnel-interface)# commit 
Commit complete.

Now if we check the status of the BFD sessions we can clearly see that the GRE tunnel is down.

vEdge-4# show bfd sessions 
                               SOURCE REMOTE              DST PUBLIC                   
SYSTEM IP     SITE ID  STATE   COLOR  COLOR  SOURCE IP    IP          ENCAP  UPTIME    
------------------------------------------------------------------------------------
50.50.50.50   50       up      mpls   mpls   10.70.1.1    10.50.1.1   ipsec  0:01:51
60.60.60.60   60       down    lte    lte    172.16.1.2   60.1.1.1    gre    NA