Skip to main content

Let’s start this book with the simplest question that one can ask about wide-area networking - why do organizations build and maintain WAN networks? The answer to this question is straightforward. Still, it gives us great insight into why we need a software-defined WAN. 

Figure 1.1. The function of the WAN.
Figure 1.1. The function of the WAN.

Organizations build WAN networks to connect users at remote locations to business applications in a secure and reliable way. In the most generic form, the users are typically employees, customers, and devices at remote branches that need access to business applications and the Internet, as shown in the diagram above. Hence, in simple terms, a WAN design is shaped by the users and the applications they need to access.

Traditional WAN

For many years, users were fixed at remote locations, and applications were hosted in data centers. For example, employees worked from offices, and customers shopped from physical stores. On the other hand, applications were hosted on-prem in the organization’s data center or regional hub. The wide-area network perimeter was clearly defined, and its role was simply to provide reliable connectivity between branches and the data center, as shown in Figure 1.2. This is what we call traditional WAN architecture.

Figure 1.2. Traditional WAN architecture in retail.
Figure 1.2. Traditional WAN architecture in retail.

Let’s use retail as a classic example of a traditional WAN. One of the most business-critical applications in retail is the Point of Sales (POS) system. In a traditional retail company, the POS is hosted on-prem in the data center. 

Users (employees and customers) at remote stores access the business apps and the POS system primarily via the MPLS circuit to the DC in an active-standby manner. The redundant Internet link is usually kept as a standby backup and is used only if the MPLS circuit fails. Internet access is also centralized, so users in the stores go to the Internet through the data center’s Internet gateway.
The typical store network is very static and hardware-centric. Opening a new store takes weeks of planning and preparation, including ordering WAN circuits, getting them up and running, and deploying manual box-to-box configurations. 

The network perimeter is clearly outlined from a security standpoint, and the boundaries are well-defined. Everything inside the corporate network is treated as trusted, while the Internet and the DMZ are treated as untrusted. The security stack (Firewalls, IPS, URLF, AMP, Proxies, etc.) is deployed in the data center. All traffic from the store is backhauled to the data center, inspected by these tools, and only then allowed out to the Internet and back, as shown in the diagram above. 

Although not very efficient, this model worked pretty well for a long time. It worked because most applications, databases, and critical data were hosted in the data center, and the network perimeter was well-defined and very static. Most traffic was headed to one main destination: the data center, which also served as the path to the Internet.

And then came Public Cloud

At some point in the mid-2010s, organizations started to realize the benefits of cloud computing and started moving serious workloads to the public cloud. They wanted faster delivery, lower upfront costs, and the ability to scale up or down on demand. 
Applications began moving out of the data centers and going into the public cloud at an ever-increasing rate. Some applications are even built and run directly in the cloud nowadays (called cloud-native or born-in-the-cloud apps). 

Figure 1.3 Applications moved out of the data center.
Figure 1.3 Applications moved out of the data center.

This shift broke the traditional WAN assumption. The classic wide-area network (WAN) was designed around one main idea: users in remote locations need to reach applications in the data center. Although some services were still hosted in public clouds and on the Internet, the traffic from remote locations still went to the data center first. It was then routed to the public cloud and back, as illustrated in Figure 1.3. 

Around the same time, another industry-changing model emerged called Software as a Service (SaaS). Organizations of all shapes and sizes started moving away from on-premises deployments and consuming business applications as a service over the Internet. Examples of Internet-native apps are Microsoft Office 365, Google Workspace, Salesforce, ServiceNow, and many others. 

Due to SaaS and Public Cloud, most traffic at remote branches changed direction and was now destined for the Internet instead of the data center. This has completely changed the wide-area perimeter and security scope. The network and security teams started realizing that the traditional WAN architecture is not well equipped to provide reliable and secure branch-to-the-Internet and branch-to-any-cloud connectivity. Legacy network technologies have not been invented with cloud-based applications in mind. They cannot offer the scalability, flexibility, and security that organizations need today. 

The main challenge that organizations face with the cloud and SaaS is that traditional security is centralized at the data center. The traditional WAN design does not apply security at remote branches. All user traffic is meant to be sent and inspected at the data center’s security stack and then routed out to the Internet and back, as shown in the diagram below.

Figure 1.4 Backhauling Internet-bound traffic to the data center.
Figure 1.4 Backhauling Internet-bound traffic to the data center.

However, backhauling Internet traffic to the data center creates multiple very real downsides. It adds latency because traffic takes a longer path than it needs to. This creates the following inefficiencies:

  • Higher latency and unreliable connectivity due to the geographically distributed location of branches and the data center.
  • Larger fault domains and data center dependency (possibly a single point of failure).
  • Bandwidth bottleneck at the data center’s WAN links.
  • Performance bottleneck at the data center’s security stack.
  • Inability to use DNS and geo-location services at branches.
  • The bottom line: poor application experience, which translates to unproductive employees, unhappy customers, and lost revenue.

In the end, users feel that applications are slow or unreliable, which hurts productivity and customer quality of experience.
But why not let branches connect directly to the Internet?

One logical solution to the “backhauling to the data center” problem is letting branches send traffic directly to the Internet through their site-local Internet circuits. The network team can easily allow local Internet exit at branches by simply configuring a default route pointing to the site-local ISP and enabling NAT. This can significantly improve the latency and the quality of experience of Internet-native apps by not having to backhaul the traffic to the data center. However, bypassing the security stack and opening the branches to the entirety of the Internet exposes the organization to critical cybersecurity vulnerabilities such as:

  • Users accessing unauthorized web and storage locations.
  • Infiltrating sensitive customer or corporate data.
  • Fishing and spearfishing attacks.
  • Viruses, spyware, trojans.
  • Ransomware.
  • The bottom line is poor customer experience, lower brand reputation, and possible legal liability.

Therefore, if an organization enables local Internet exit at branches, it must deploy security devices (firewalls, IPS, proxies, etc.) at every site, as shown in the diagram below.

Figure 1.5 Implementing security at remote branches.
Figure 1.5 Implementing security at remote branches.

However, with the hardware-centric traditional WAN architecture, this would mean installing one or multiple hardware security appliances at each remote branch.  You can imagine that this approach would be expensive, slow, and unscalable. For organizations with thousands of remote locations, this is simply impractical. 

The challenges do not stop with security. We all know that Internet circuits do not have guaranteed quality, and network paths via the Internet can experience performance degradation anytime. But if organizations connect to business-critical applications via the Internet - how do network teams ensure that applications’ Quality of Experience (AppQoE) is good enough? 

In the end, we network engineers are responsible for guaranteeing the network performance of all applications our organizations rely on, whether we own the network paths or not. However, with public cloud and SaaS, we have become responsible for our internal networks, our ISPs, as well as the networks of cloud and SaaS providers and the ISPs they rely on. 

Still, the business expects the applications to work properly. They don’t care whether there is a packet loss on the Internet or the cloud provider’s upstream network. 

We obviously need visibility in the network paths via the Internet to SaaS and Cloud providers. And we need networks that can reroute around performance degradations outside our own networks. However, this is tough to achieve using legacy routing and switching technologies.

And then came Remote Work

On top of SaaS and Public Cloud, another technology trend imposes new challenges on the WAN. Working from anywhere has become a standard overnight. Online meetings, remote conferences, and even working from the local cafe are traditional practices these days. Adopting the new broadband cellular technologies, such as 4G/5G, allows for high-bandwidth, low-latency network access to new places that were not applicable for work. Employees now use various personal devices to connect to the network and access business-critical applications and potentially sensitive corporate data. Remote working has completely changed the wide-area network scope and security requirements. 

Security teams faced a colossal challenge in securing the network perimeter with the public cloud adoption. However, with remote workers accessing applications from everywhere, providing complete end-to-end security and quality of experience becomes impossible with the existing network technologies. Each employee working from home or a cafe in the mall now requires secure access to business-critical applications and sensitive corporate data hosted on-prem or in the cloud. 

With traditional WAN architecture, remote workers access the corporate network via SSL VPN (for example, Cisco AnyConnect) to the data center. However, with many applications now hosted off-prem and consumed as a service, backhauling the remote worker’s traffic to the DC and then to the Internet creates many inefficiencies, such as poor application experience and unreliable connectivity. Although a technique called “split-tunneling” allows the remote VPN user to access the Internet locally and simultaneously access the corporate network through the VPN, this exposes the organization to critical security vulnerabilities.
The bottom line is that Public Cloud and SaaS have pushed applications beyond the boundaries of the traditional WAN. Remote workers have pushed users well beyond the traditional network perimeter. Network and security teams now face a huge challenge protecting the corporate WAN with users on any device, anywhere, and applications in any public cloud.

All this led to Network Sprawl

For many years, networks have been manually deployed and operated. In a traditional WAN environment, to achieve the desired operational state, each network device is individually configured by network administrators via CLI in a box-to-box hardware-centric way, as illustrated in Figure 1.6.

Figure 1.6 Decentralized operational model used in traditional networks.
Figure 1.6 Decentralized operational model used in traditional networks.

Think of the old days of personal computers. Installing new hardware or software requires users to configure the individual elements of the PC. For example, you buy a new sound card for your computer. You install the card and then go to the manufacturer's website and download the drivers for your operating system. Then you install the drivers and resolve any incompatibility issues. And then you begin listening to music, eventually.

We were using old personal computers as a collection of individual components.

Now compare this to the process nowadays. You just plug in a new hardware component, and the operating system sets up everything for you. You are not managing separate parts anymore. You just signify the intent to listen to music, and the operating system configures all underlying components necessary to play music, as shown in Figure 1.7. 
You are using the computer as a system, not as a group of individual components, as shown in the following diagram.

Figure 1.7 Personal computer as-a-system.
Figure 1.7 Personal computer as-a-system.

Public cloud providers use the same operational model. When you want to spin up a new router and a new subnet in the cloud, you just declare what you want, and the cloud provider configures all necessary underlying components behind the scenes.
So why can’t we do the same with wide-area networks? Why can’t the WAN be managed as a single system, instead of a collection of individual devices, as shown in the diagram below?

Figure 1.8 Network operated as-a-system.
Figure 1.8 Network operated as-a-system.

This shift in how we operate the wide-area network is probably the most fundamental idea and selling point behind the entire Cisco Catalyst SD-WAN solution. 

From a pure technical perspective, Cisco SD-WAN does not invent anything fundamentally new. The core functions it delivers, such as dynamic tunneling, encryption, failover, path selection, QoS, segmentation, even ZTP, already existed for years. You could build almost the same WAN with a mix of different existing technologies, such as DMVPN, FlexVPN, mGRE+NHRP, LISP, or classic routing protocols like BGP, and an MPLS underlay. Many organizations still run networks like this, and they work.
But each of these technologies solves only a single piece of the puzzle. LISP helps with mapping and segmentation. FlexVPN handles tunneling and encryption. NHRP supports next-hop discovery. BGP does path control, etc. But none of these tools is a complete WAN system by itself.

The problem is not the capability. It is the operational model.

None of these tools and features gives you a complete WAN solution. They are like LEGO blocks that you assemble to achieve the desired WAN architecture. And then jump from box to box and configure everything. This demands a network team that knows every detail of the entire stack and can piece together multiple technologies and protocols. It also puts the responsibility in-house if something doesn’t interoperate as expected.

Figure 1.9. Old operational model.
Figure 1.9. Old operational model.

Nowadays, Chief Technology Officers (CTOs) and network owners want a complete system with clear integration and visibility. They want one vendor that takes end-to-end responsibility for the whole solution. They want a solution that hides the complexity of connecting different protocols and making them work together. They simply want red, yellow, green health indicators, SLA graphs, and performance metrics that explain the situation at a glance.

We all have seen that the mix of network and security technologies is constantly growing. The scale, complexity, and speed at which network and security departments operate continually increase. 

Vendors realize that customers just stop buying their solutions because it has become extremely complex to design, deploy, operate, and phase out individual components of the network and security stacks. That’s one of the reasons excellent technologies that were ahead of their time, like FlexVPN, never reached mainstream adoption. 

On top of that, choosing and buying have become incredibly complex. There are thousands of individual products, part numbers, ordering guides, licensing schemes, support offerings, etc. 

Organizations nowadays want a unified solution that provides networking and security and is bought and managed as a system and not as a collection of individual devices.

The need for a next-generation WAN

All these next-generation trends in the IT industry have one thing in common - they require next-generation networks. The traditional WAN architecture no longer works well in a digital world where applications are out of the data center, and users consuming those applications use a diverse set of mobile devices and access sensitive data from everywhere. Each next-generation technology trend, such as cloud, automation, and remote work, imposes new requirements on the network infrastructure.

Gone are the days when the network was considered just a bunch of data pipes that provides plain connectivity between sites. Organizations now realize that the network is an infrastructure element that is key to the overall growth and security strategies of the organization.

Key Takeaways

  • The key takeaway of this chapter is that the operational model is the most important selling point behind the Cisco Catalyst SD-WAN solution. Modern network teams do not want to stitch together and manage a WAN, box by box and protocol by protocol. They want one unified solution that is purchased and operated as a complete, end-to-end system.
  • Security requirements related to public cloud and remote workers are the second most common reason why organizations choose to go for a software-defined WAN solution. The network perimeter has expanded well beyond branches and data centers. Applications are now hosted in the cloud or consumed as a service. Users access sensitive corporate data from any location using any device at any time. Network and security teams simply cannot provide consistent end-to-end security using traditional routing and switching technologies. They need new network capabilities that the software-defined WAN architecture offers.
  • The third most common reason to go for SD-WAN is to lower the WAN's TCO (total cost of ownership) by using cheap Internet links instead of expensive MPLS circuits. As they say, the Internet is the new WAN. This is becoming even more relevant with LEO satellite Internet.
  • Another common reason is the need for a flexible and automated means of adopting the multi-cloud. Network teams face the tough challenge of providing reliable branch-to-any-cloud and branch-to-the-Internet connectivity and having visibility into the cloud providers’ infrastructure.
  • Providing consistent application quality of experience (AppQoE) across op-prem, cloud, and SaaS applications and the complexity associated with the in-house management of a large number of network and security devices are the next most common drivers toward software-defined WAN.
  • Organizations now more than ever realize that the next-generation technology trends require a next-generation network to support them. There is no other way