Technologies used to virtualize cloud networks are evolving quickly. Many times it is non-trivial to sort out the different technologies and understand merits and demerits of specific approaches.
This write-up 1) explains the network virtualization techniques used in legacy virtualized environments, 2) discusses the difference between legacy and cloud datacenters, and 3) compares the virtualization techniques used in cloud datacenters.
Figure 1: Legacy Datacenter Network
Figure 1 shows a legacy datacenter that runs server virtualization.
Typically the network is segmented into multiple L2 domains. Each L2 domain comprises of a set of ToRs (Top of Rack) switches that connect to a pair of aggregation switches. The virtualized servers are connected to ToR switches. There are other variances with “End-of-Row” switch etc., but fundamentally it comes down to a small and well-defined span of L2 domain. These designs create a tree-topology where each node essentially imposes serious over-subscriptions. The performance of the network is managed by adding additional links/ports etc. to each aggregation switch through the process of “capacity management”. In most cases, the network looks different in different parts of the datacenter, even though they are based of the same architecture.
Figure 2: Network Virtualization in legacy Datacenters
Figure 2 shows how network virtualization is implemented in such a legacy network. Typically, an Ethernet trunk (carrying all or selected VLANs) is extended from ToR switches to each virtualized server. The virtual machines instantiated within the server connect to one or more VLANs. The VLANs span is limited by the size of the L2 domain as shown in the diagram.
In most cases only a single routing space is supported on the L2/L3 aggregation switches. Any inter-VLAN traffic gets routed at the aggregation switch. Packet filters (ACLs, access control lists) are also typically applied on the aggregation switch on each VLAN interface port.
Figure 3: Multi-tenancy in legacy datacenter
People who want to use a similar frame-work for scenarios that support multiple tenants with overlapping address spaces, have used either VRF-lite (Virtual Routing Forwarding without MPLS) or proper VRF with MPLS to create separate routing space for each tenant on the aggregation layer.
In many cases this approach also entails separate pairs of physical firewall and load-balancer appliances dedicated to each tenant. As shown in Figure 3, the core layer would need to be enabled with MPLS (along with control plane protocol like LDP etc.) to span a tenant across multiple L2 domains.
Figure 4: Basic Network Virtualization in Cloud Datacenters
A large number of the modern cloud datacenters are built using L3 (routing) to the ToR switch. This is primarily done since L3 routing protocols like OSPF or BGP can easily support densely intermeshed topologies (e.g. CLOS) and help utilize the symmetrical IP fabrics by distributing flows over multiple equal cost paths. IP network is also chosen due to its ubiquity as it can span across multiple data centers etc.
In such datacenters, the VLAN construct is largely inapplicable due to lack of L2 domains. Hence, there has been some adoption of encapsulation mechanisms like VXLAN (or NVGRE) that encapsulates each Ethernet frame into an IP packet that transports it over IP network. In Figure 4, each colored dotted line represents a similar L2 network. The soft switches (virtual switches) that run on each virtualized servers are typically L2 only. Any traffic that is trying to leave the L2 LAN is send to a software router that runs as a virtual machine (VM). In some cases the virtual switch has been given the capability to switch traffic between LANs, but the gateway is typically a software router that runs as a VM.
In such setups, each tenant potentially gets a pair of software routers that act as gateways. They get additional routers based on capacity needs. The routers implement policies like packet filtering, NAT (network address translation) etc.
Figure 5: Multi-tenant Cloud Datacenters with advanced server based capabilities
However, technologies that can help avoid the complexity of operating multiple software routers, etc. are available.
Figure 5 shows how each software switch inside the virtualized servers can be made to perform switching, routing and in-line packet en(de)capsulation. Thus the kernel modules residing within each virtualized server acts as a multi-VRF router that performs NAT, packet filtering and external access etc. directly. Such deployments are not subject to operational issues that come naturally with multiple instances of software routers.
Figure 6: Virtual Services within multi-tenant cloud datacenters
Deployments that utilize technologies as described in Figure 5, easily abstract out the virtual machines within cloud datacenters into multiple tenants. Each of these tenants can have multiple Virtual Networks within security policies between the networks. These tenants can also impose security groups on individual instance basis or insert virtualized service instances (e.g. virtual firewall, virtual DDos mitigation etc.) as needed between the virtual networks.
Typically these datacenters would need some standard routers that supports IP en(de)capsulations and MBGP (multi-protocol BGP) to act as a gateway to external public networks. Thus the extreme forwarding power of ASIC within the routers can be leveraged in this approach.
The other advantage of this approach is the easy integration with the service provider L3VPN infrastructure through the same set of gateway router.
Figure 7: Multi-tenant multi-datacenter cloud network
Figure 7 shows, how the network virtualization technology can be easily extended to build a multi-tenant cloud infrastructure over multiple datacenters. Since the underlying fabric that typically inter-connects the different datacenters are IP, the cloud virtualization technologies that use IP encapsulations easily span across multiple datacenters.
All the datacenters can use a common router gateway to access external network or service provider L3VPN infrastructure.
Serious cloud service providers and large scale enterprise networks are considering cloud network virtualization techniques that don’t put artificial constraint on the scaling of the networks based on archaic L2-L3 demarcation, etc. They are looking for technologies and solutions that provide multi-datacenter and multi-tenant L2-L3 support in a scaled out way that can be heavily operationalized. More so, the technologies need to not only be open sourced but also based on open standards where the protocol interactions, etc. are clearly spelled out.
Interestingly, the technology now available with OpenContrail (shown in Figure 5, 6, and 7) meets many of these requirements. Time will tell which vendor/provider/enterprise adopts strategic technologies that lead to their overall success.