In this lesson, we will explore a common problem that occurs in scenarios where a dual-homed, dual-transport site advertises routes with different origin metrics or origin types. The site will typically be a data center or a regional hub with a site-local IGP such as BGP or OSPF running in the environment. We will see how the Overlay Management Protocol (OMP) best-path selection causes a complete loss of reachability to a remote site in case of a link failure, even if backup paths exist.

What is OMP Send-Backup-Paths?

By default, when a Cisco vSmart controller receives multiple OMP routes to the same destination, it runs them through the OMP best-path algorithm and selects the best ones to that destination subnet. Then the vSmart controller advertises only the best routes out to the rest of the overlay fabric. This behavior is pretty well-known to network engineers. BGP route reflectors work pretty much the same way when receiving and re-advertising routes to route-reflector clients.

The Cisco SD-WAN solution provides a configuration option that tells the vSmart controller to advertise the first set of non-best routes to vEdge routers. The configuration is as simple as adding one configuration line on the controller, as shown in the output below.

vSmart# config
Entering configuration mode terminal
vSmart(config)# omp send-backup-paths 
vSmart(config-omp)# commit
Commit complete.
vSmart#

The key point to notice here is what "the first set of non-best routes" mean? Suppose there are 2 equal-cost best routes to a destination and name them ''routes-X". Let's say that there are another 2 equal-cost non-best routes to the same destination and name them "routes-Y". By default, vSmart will only advertise routes-X. When we enable the omp send-backup-path option, the controller will advertise routes-Y alongside routes-X. However, if there are other routes that are worst than routes-Y, they will not be sent out by the vSmart controller, because only the first set of non-best routes are advertised! 

When is OMP Send-Backup-Paths used?

Let's now see a real-world example of when we would like to enable the vSmart controller to send non-best paths.

The initial state

We have three WAN edge routers - vEdges 1, 2, and 3. Routers 1 and 2 are located in a data center with site-id 1. Router 3 is located at a remote branch with site-id 3. The initial state of this lab example is illustrated in figure 1 below. 

Normal Scenario
Figure 1. Normal Scenario

The routers have the following transport attachments:

  • vEdge-1 has got two TLOCs - T11 marked with the mpls color and T12 marked with the biz-internet color.
  • vEdge-2 has also got two TLOCs - T21 marked with the mpls color and T22 marked with the biz-internet color.
  • vEdge-3 has only got one TLOC - T31 marked with the mpls color.

OSPF runs in the data center between vEdges 1 and 2 and the local network devices. The site-local router in the datacenter advertises subnet 10.1.1.0/24 to vEdges 1 and 2 in VPN10:

  • vEdge-1 receives the 10.1.1.0/24 via OSPF with metric 50 and redistributes it into OMP with origin-metric 50
  • vEdge-2 receives the 10.1.1.0/24 via OSPF with metric 90 and redistributes it into OMP with origin-metric 90

As illustrated in figure 1, the vSmart controller receives four OMP routes for 10.1.1.0/24. It runs the OMP best-path algorithm and selects the ones via vEdge-1 as best routes because they have lower origin-metric 50 than the router via vEdge-2 (metric 90).

vSmart# show omp routes vpn 10 10.1.1.0/24 | t
Code:
C   -> chosen
I   -> installed
Red -> redistributed
Rej -> rejected
L   -> looped
R   -> resolved
S   -> stale
Ext -> extranet
Inv -> invalid
Stg -> staged
IA  -> On-demand inactive
U   -> TLOC unresolved

                 PATH                      ATTRIBUTE                                                       
FROM PEER        ID     LABEL    STATUS    TYPE       TLOC IP          COLOR            ENCAP  PREFERENCE  
-----------------------------------------------------------------------------------------------------------
1.1.1.1          66     1009     C,R       installed  1.1.1.1          mpls             ipsec  -           
1.1.1.1          68     1009     C,R       installed  1.1.1.1          biz-internet     ipsec  -           
1.1.1.2          66     1016     R         installed  1.1.1.2          mpls             ipsec  -           
1.1.1.2          68     1016     R         installed  1.1.1.2          biz-internet     ipsec  -           

Subsequently, the vSmart controller advertises to vEdge-3 only the best routes - 10.1.1.0/24 via T11 and 10.1.1.0/24 via T12. Therefore, vEdge-3 does not even know that subnet 10.1.1.0/24 is also reachable via vEdge-2!

Additionally, vEdge-3 marks route 10.1.1.0/24 via T12 as Invalid because it does not have an overlay tunnel to vEdge-1's biz-internet TLOC.

vEdge-3# sh omp route vpn 10 10.1.1.0/24 | t
Code:
C   -> chosen
I   -> installed
Red -> redistributed
Rej -> rejected
L   -> looped
R   -> resolved
S   -> stale
Ext -> extranet
Inv -> invalid
Stg -> staged
IA  -> On-demand inactive
U   -> TLOC unresolved

                 PATH                      ATTRIBUTE                                                       
FROM PEER        ID     LABEL    STATUS    TYPE       TLOC IP          COLOR            ENCAP  PREFERENCE  
-----------------------------------------------------------------------------------------------------------
1.1.1.30         87     1009     C,I,R     installed  1.1.1.1          mpls             ipsec  -           
1.1.1.30         88     1009     Inv,U     installed  1.1.1.1          biz-internet     ipsec  -           

In the end, vEdge-3 has only one valid path to reach subnet 10.1.1.0/24 through the mpls interface of vEdge-1, even though the subnet can be reached via vEdge2's mpls interface!

vEdge-3# sh ip route vpn 10 10.1.1.0/24 | t

     ADDRESS               PATH            PROTOCOL          NEXTHOP  NEXTHOP                         NEXTHOP          
VPN  FAMILY   PREFIX       ID    PROTOCOL  SUB TYPE  METRIC  IFNAME   ADDR     TLOC IP  COLOR  ENCAP  VPN      STATUS  
-----------------------------------------------------------------------------------------------------------------------
10   ipv4     10.1.1.0/24  0     omp       -         0       -        -        1.1.1.1  mpls   ipsec  -        F,S     

However, there is IP reachability between the remote site and the data center network in a normal state of the environment.

vEdge-3# ping vpn 10 10.1.1.1
Ping in VPN 10
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
64 bytes from 10.1.1.1: icmp_seq=1 ttl=254 time=46.7 ms
64 bytes from 10.1.1.1: icmp_seq=2 ttl=254 time=61.9 ms
64 bytes from 10.1.1.1: icmp_seq=3 ttl=254 time=41.7 ms
^C
--- 10.1.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 41.730/50.150/61.944/8.593 ms

The problem

Let's see what happens when we introduce a failure by shutting down the mpls TLOC on vEdge-1, as illustrated in figure 2 below.

Failure Scenario
Figure 2. Failure Scenario - TLOC T11 is down

vEdge-1 no longer advertises the OMP route 10.1.1.0/24 via T11 because the mpls interface is down. Subsequently, the vSmart controller has only one route to 10.1.1.0/24 via vEdge-1. However, the OMP best-path algorithm still chooses this route as best, because it has a lower origin-metric (50) than the ones via vEdge-2 (90). In the end, vEdge-3 receives only the OMP route to 10.1.1.0/24 via T12 (the biz-internet TLOC of vEdge-1). However, vEdge-3 does not have an overlay tunnel to any remote biz-internet TLOC. Therefore, the route 10.1.1.0/24 via the biz-internet TLOC of vEdge-1 is marked as Invalid and Unresolved.

vEdge-3# show omp route vpn 10 10.1.1.0/24 | t   
Code:
C   -> chosen
I   -> installed
Red -> redistributed
Rej -> rejected
L   -> looped
R   -> resolved
S   -> stale
Ext -> extranet
Inv -> invalid
Stg -> staged
IA  -> On-demand inactive
U   -> TLOC unresolved

                 PATH                      ATTRIBUTE                                                       
FROM PEER        ID     LABEL    STATUS    TYPE       TLOC IP          COLOR            ENCAP  PREFERENCE  
-----------------------------------------------------------------------------------------------------------
1.1.1.30         94     1009     Inv,U     installed  1.1.1.1          biz-internet     ipsec  -           

Although vEdge-3 has an overlay tunnel to vEdge-2 and BFD session in UP state, it doesn't know that subnet 10.1.1.0/24 could be reached via vEdge-2! If we ping 10.1.1.1, we will see that vEdge-3 has completely lost connectivity to the data center's network.

vEdge-3# ping vpn 10 10.1.1.1
Ping in VPN 10
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
From 127.1.0.2 icmp_seq=1 Destination Net Unreachable
From 127.1.0.2 icmp_seq=2 Destination Net Unreachable
From 127.1.0.2 icmp_seq=3 Destination Net Unreachable
^C
--- 10.1.1.1 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss

The solution

One of the most effective solutions to this problem is to tell the vSmart controller to advertise also the first set of non-best routes. In our example, these would be the routes via vEdge-2.

Solution Scenario
Figure 3. Solution Scenario

Configuring the feature is as simple as applying one configuration line omp send-backup-paths on the vSmart controller, as shown in the output below:

vSmart# conf t
Entering configuration mode terminal
vSmart(config)# omp ?                 
Possible completions:
  controller-send-path-limit   Limit number of paths sent to vSmart controller
                               for a prefix
  discard-rejected             Enable/Disable storage of information rejected
                               by policy
  graceful-restart             Enable/Disable graceful restart
  send-backup-paths            Enable/Disable transmission of backup paths
  send-path-limit              Maximum number of paths sent for a prefix
  shutdown                     Enable/disable OMP
  timers                       Set timers
  <cr>                         
vSmart(config)# omp send-backup-paths 
vSmart(config-omp)# commit and-quit 
Commit complete.

Once we configure the send-backup-paths option, we can see that the controller is now advertising the routes via vEdge-2 alongside the best routes via vEdge-1. Now if we check the omp routing table on vEdge-3, we will see that it has a valid route to 10.1.1.0/24 through the mpls TLOC of vEdge-2.

vEdge-3# show omp route vpn 10 10.1.1.0/24 | t
Code:
C   -> chosen
I   -> installed
Red -> redistributed
Rej -> rejected
L   -> looped
R   -> resolved
S   -> stale
Ext -> extranet
Inv -> invalid
Stg -> staged
IA  -> On-demand inactive
U   -> TLOC unresolved

                 PATH                      ATTRIBUTE                                                       
FROM PEER        ID     LABEL    STATUS    TYPE       TLOC IP          COLOR            ENCAP  PREFERENCE  
-----------------------------------------------------------------------------------------------------------
1.1.1.30         94     1009     Inv,U     installed  1.1.1.1          biz-internet     ipsec  -           
1.1.1.30         106    1016     C,I,R     installed  1.1.1.2          mpls             ipsec  -           
1.1.1.30         107    1016     Inv,U     installed  1.1.1.2          biz-internet     ipsec  -           

Now there is IP reachability between the remote site and the data center.

vEdge-3# ping vpn 10 10.1.1.1 
Ping in VPN 10
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
64 bytes from 10.1.1.1: icmp_seq=1 ttl=254 time=57.6 ms
64 bytes from 10.1.1.1: icmp_seq=2 ttl=254 time=48.9 ms
64 bytes from 10.1.1.1: icmp_seq=3 ttl=254 time=47.1 ms
^C
--- 10.1.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 47.182/51.229/57.600/4.563 ms

And the ultimate test is to check the actual data path using a traceroute from vEdge-3 to the data center network. You can that the traffic entering the DC via vEdge-2.

vEdge-3# traceroute vpn 10 10.1.1.1
Traceroute  10.1.1.1 in VPN 10
traceroute to 10.1.1.1 (10.1.1.1), 30 hops max, 60 byte packets
 1  10.10.1.5 (10.10.1.5)  31.16 ms  41.46 ms  41.46 ms --> vEdge-2
 2  10.10.1.6 (10.10.1.6)  42.91 ms * *  --> The site-local router

Key Takeaways

  • vSmart only advertises the best routes to a destination according to the OMP best-path algorithm.
  • The default behavior of vSmart hides reachability information from remote peers, similarly to a BGP route-reflector.
  • In scenarios where there isn't full IP reachability between TLOCs, and there are remote sites with limited overlay, this might create blackholes in failure scenarios.
  • The OMP send-back-paths option tells the vSmart controller to advertise the first set of non-best routes alongside the best ones.
  • Notice that the overall number of advertised routes is subject to the OMP send-path-limit parameter.