Cisco SD-WAN solution is designed to run over any kind of WAN transport that is available to the WAN edge devices including all different public networks such as Broadband, 4G/5G, LTE, Business Internet, and so on. This implies that the overlay fabric should be able to form through all flavors of Network Address Translations that these public networks utilize. In practice, any Cisco SD-WAN device may be unknowingly sitting behind one or more NAT devices. In order to discover the public IP addresses/ports allocated by NAT, Cisco SD-WAN devices use the Session Traversal Utilities for NAT (STUN) protocol defined in RFC5389.
STUN is a client-server protocol that uses a request/response transaction in which a client sends a request to a server, and the server returns a response. As the request (called STUN Binding Request) passes through a NAT, the NAT will modify the source IP address/port of the packet. Therefore, the STUN server will receive the request with the public IP address/port created by the closest NAT device. The STUN server then copies the public address into an XOR-MAPPED- ADDRESS attribute in the STUN Binding response and sends it back to the client. Going back through the NAT, the public address/port in the IP header will be un-NATted back to the private ones, but the public address copy in the body of the STUN response will remain untouched. In this way, the client can learn its IP address allocated by the outermost NAT with respect to the STUN server.
As it is shown in Figure 1, all Cisco SD-WAN devices have an embedded STUN client and the vBond orchestrator acts as a STUN Server. When the initial control communication to vBond takes place, the SD-WAN device performs the STUN operations and discovers its public IP address and port. Once determined, this information is then advertised as part of the TLOC routes to the vSmart controllers and then re-advertised to all other SD-WAN devices.
In a typical production SD-WAN deployment, we would probably have many remote sites connected via many different Internet connections to a centralized data center or a regional hub. In most regions in the world, Internet providers will always use some type of private-public address translation due to a shortage of public IPv4 addresses. Let's look at the NAT classifications according to the STUN protocol and how they can affect whether sites can form connections and communicate directly with each other or not.
A full-cone is one where all packets from the same internal IP address are mapped to the same NAT IP address. This type of address translation is also known as One-to-One.
Additionally, external hosts can send packets to the internal host, by sending packets to the mapped NAT IP address.
A Restricted-Cone network address translation is also known as Address-Restricted-Cone. It is a network translation technique where all packets from the same internal IP address are mapped to the same NAT IP address. The difference to a Full-Cone is that an external host can send packets to the internal host only if the internal host had previously sent a packet to the IP address of the external destination. It is important to note that once the NAT mapping state is created, the external destination can communicate back to the internal host on any port.
A Port-Restricted-Cone is similar to the Restricted-Cone address translation, but the restriction includes also port numbers. The difference is that an external destination can send back packets to the internal host only if the internal host had previously sent a packet to this destination on this exact port number. In a typical Cisco IOS/IOS-XE or Cisco ASA configuration, this feature is known as Port Address Translation (PAT).
Symmetric NAT is also known as Port Address Translation (PAT) and is the most restrictive of all other types. It is a network translation technique where all requests from the same internal IP address and port to a specific destination IP address and port, are mapped to a unique NAT IP address and NAT port. Furthermore, only the external destination that received a packet can send packets back to the internal host. In a typical Cisco IOS/IOS-XE or Cisco ASA configuration, this feature is known as Port Address Translation (PAT) with port-randomization.
Although Cisco SD-WAN supports several types of Network Address Translations, to create a full mesh overlay fabric, at least one side of the WAN Edge tunnels is recommended to be able to initiate a connection inbound to the second WAN Edge. This means that at least one side of the tunnel is recommended to have a public IP address or to be behind a Full-Cone (1-to-1). It is also strongly recommended to configure full-cone, or one-to-one address translation at the data centers or regional hub sites so that, regardless of what NAT type is running at the remote sites (restricted-cone, port-restricted cone, or symmetric ), they can send traffic to the hubs without issues.
|vEdge-1||vEdge-2||IPsec tunnel can form||GRE tunnel can form|
|No-NAT (Public IP)||No-NAT (Public IP)||YES||YES|
|No-NAT (Public IP)||Symmetric||YES||NO|
|Full Cone (One-to-one)||Full Cone (One-to-one)||YES||YES|
|Full Cone (One-to-one)||Restricted-Cone||YES||NO|
|Full Cone (One-to-one)||Symmetric||YES||NO|
Symmetric address translation configured at the transport attached to one vEdge requires a full-cone or a public IP on the other vEdge to establish a direct IPsec tunnel between them. Sites that cannot connect directly should be set up in a hub-and-spoke topology so they can reach each other through a regional hub site or data center.
IMPORTANT Note that for overlay tunnels configured to use GRE encapsulation instead of IPsec, only public IP addressing or one-to-one address translation is supported. Any type of Network Address Translation with port overloading is not supported since GRE packets lack an L4 header.
Once every WAN edge router discovers its private-public translated address and port, it advertises them to the vSmart controller via OMP using the OMP TLOC routes. The vSmart controller then re-advertises this information across the overlay fabric.
Lastly, let's see an example of two WAN edge devices connected through a Port-Restricted-Cone. As you can verify in the combination table, they are able to form an IPsec encapsulation tunnel between themselves but if we change the encapsulation type to GRE - the data plane tunnel does not come up. Let's quickly verify that.
These are both TLOCs when the enc is set to ipsec.
--------------------------------------------------- tloc entries for 220.127.116.11 lte ipsec --------------------------------------------------- RECEIVED FROM: peer 18.104.22.168 status C,I,R loss-reason not set lost-to-peer not set lost-to-path-id not set Attributes: attribute-type installed encap-key not set encap-proto 0 encap-spi 256 encap-auth sha1-hmac,ah-sha1-hmac encap-encrypt aes256 public-ip 22.214.171.124 public-port 12346 private-ip 192.168.1.2 private-port 12346 public-ip :: public-port 0 private-ip :: private-port 0 bfd-status up domain-id not set site-id 60 overlay-id not set preference 0 tag not set stale not set weight 1 version 2 gen-id 0x80000010 carrier default restrict 0 groups ( 0 ) border not set unknown-attr-len not set
--------------------------------------------------- tloc entries for 126.96.36.199 lte ipsec --------------------------------------------------- RECEIVED FROM: peer 188.8.131.52 status C,I,R loss-reason not set lost-to-peer not set lost-to-path-id not set Attributes: attribute-type installed encap-key not set encap-proto 0 encap-spi 256 encap-auth sha1-hmac,ah-sha1-hmac encap-encrypt aes256 public-ip 184.108.40.206 public-port 12426 private-ip 172.16.1.2 private-port 12426 public-ip :: public-port 0 private-ip :: private-port 0 bfd-status up domain-id not set site-id 70 overlay-id not set preference 0 tag not set stale not set weight 1 version 2 gen-id 0x80000014 carrier default restrict 1 groups ( 0 ) border not set unknown-attr-len not set
You can clearly see that the BFD session is UP which means that the tunnel is up and running and data plane traffic is able to go through back and forth. Now let's change the encapsulation type to GRE and see what will happen.
vEdge-3(config)# vpn 0 vEdge-3(config-vpn-0)# interface ge0/0 vEdge-3(config-interface-ge0/0)# tunnel-interface vEdge-3(config-tunnel-interface)# encapsulation ? Possible completions: gre ipsec vEdge-3(config-tunnel-interface)# encapsulation gre vEdge-3(config-tunnel-interface)# commit Commit complete. vEdge-4(config)# vpn 0 vEdge-4(config-vpn-0)# interface ge0/0 vEdge-4(config-interface-ge0/0)# tunnel-interface vEdge-4(config-tunnel-interface)# encapsulation gre vEdge-4(config-tunnel-interface)# commit Commit complete.
Now if we check the status of the BFD sessions we can clearly see that the GRE tunnel is down.
vEdge-4# show bfd sessions SOURCE REMOTE DST PUBLIC SYSTEM IP SITE ID STATE COLOR COLOR SOURCE IP IP ENCAP UPTIME ------------------------------------------------------------------------------------ 220.127.116.11 50 up mpls mpls 10.70.1.1 10.50.1.1 ipsec 0:01:51 18.104.22.168 60 down lte lte 172.16.1.2 22.214.171.124 gre NA