1 Basic Concepts
In this paper we discuss the concept of load balancing in a service chain which has multiple virtual machines for scale-out.
Figure 1 introduces the concepts of the length of a service chain and the width of a service chain.
The length of the service chain determines the functionality; if we add more services to the service chain it becomes longer.
The width of the service chain determines the capacity; if we add more capacity to the service chain it becomes wider.
The width of the service chain is not necessarily uniform. In the example below we may have three firewalls (2 Gbps each) but only two caches (3 Gbps each) to implement a 6 Gbps service chain.
Figure 1: Width and Length of a Service Chain
When there are multiple parallel instances of the same virtual service for scale-out reasons, there must be a load balancing function to spread the traffic across those multiple instances. That load balancing function can be implemented in three locations:
- On the physical router on which the service chain is anchored.
- In the OpenContrail vRouter in the hypervisor.
- In a virtual load balancer which runs in a virtual machine.
2 Load Balancing on the Physical Anchor Router
In many Network Function Virtualization (NFV) use cases, service chains are anchored on (i.e. connected to) a physical router. The anchor router is typically an edge router which sits between the access network to the customer and the core network. The anchor router is responsible for steering the customer traffic flows into the right service chain. This steering function needs to be subscriber aware and application aware. Subscriber awareness means that different subscribers are assigned to different service chains, depending on which services they subscribed to. Application awareness means that different types of applications (e.g. voice versus video steaming) are assigned to different service chains.
Juniper routers such as the MX and the Service Control Gateway (SCG) can be the anchor for a service chain. They provide subscriber-awareness through integration with a policy server such as a RADIUS server or a PCRF server. They provide application-awareness using a built-in Deep Packet Inspection (DPI) function.
If the first service in the service chain is scaled out, the physical anchor router needs to provide a load balancing function as shown in Figure 2 below. In this example the length of the service chain is 1 to keep things simple. We will consider longer service chains in the next section.
Figure 2: Load Balancing on the Physical Anchor Router
We could use simple Equal Cost Multi Path (ECMP) to spread the traffic over the multiple parallel paths. However, using ECMP causes two problems: symmetry and stickiness. We use the Traffic Load Balancer (TLB) feature on the physical router to minimize (but not eliminate) both of these problems.
The first problem is symmetry. For every forward flow, the reverse flow must follow the same path. This is required because the services in the service chain are usually stateful and need to see both directions of the flow.
Normal ECMP using the 5-typle hash does not provide symmetry. This is because for a typical hash function hash(A,B) ≠ hash(B,A).
Symmetry is typically achieved by using special “symmetric hash function” which has the special property that hash(A,B) = hash(B,A). An example of a simple symmetric hash function is to hash on the source IP only in the forward direction and on the destination IP only in the reverse direction (this hash function has the additional nice benefit of keeping all flows for a given customer together).
Symmetric hashing only works for closed service chains, i.e. for service chains which start and end on the same anchor router as in Figure 2 above. Symmetric hashing does not work for open service chains, i.e. for service chains which start and end on different routers.
To understand why symmetric hashing does not work on open service chains we first need to understand the problem of polarization and the concept of seeds which are illustrated in Figure 3 below. If we use exactly the same hash function at every router, then every flow which goes left at the first router will also go left at the second router. As a result, some paths in the network don’t receive any traffic. This problem is called polarization. To avoid polarization, each router computes hash function not only over the fields in the header of the packet P, but also includes a seed value S (sometimes called a salt). The seed value is different at each router, it could be the router ID for example. This removes the polarization from the network: all paths are used.
Figure 3: The Problem of Polarization and its Solution using Seeds
Now we can understand why symmetric hashing does not work with open service chains. Since the router at the start and at the end of the service chains use different seed values, they will map the same flow onto different paths in the service chain, even if the hash function is symmetric.
We can achieve symmetry for open service chains using a flow table. We describe this mechanism below when we discuss load balancing in the OpenContrail vRouter.
The second problem is flow stickiness.
Once a flow has been assigned to a particular path, we want the flow to remain assigned to that path for the entire duration of the flow. If we move the flow to a different path in the middle of the flow, the flow will typically die. This is because the service on the new path will only start seeing the flow in the middle of the flow and won’t know what to do with it — most stateful services need to see the entire flow.
Normal ECMP does not provide flow stickiness. This is because ECMP is implemented as a modulo-N division where N is the number of members in the load balancing group. If the number of members in the load balancing group changes from N to N+1 as a result of a scale-out event, all hashes will now be computed modulo N+1 instead of modulo N, and as a result most flows will be moved to a different path.
Let’s take a concrete example. Let’s say that there are 5 members in the load balancing group and let’s say that the hash for flow F is 13. The flow will be assigned to path number (13 modulo 5) = 3. Now, let say that a scale-out event happens while flow F is in progress and the number of load balancing group members increases to 6. Now, flow F will be assigned to path number (13 modulo 6) = 1. Thus, flow F moved from path number 3 to path number 1 and will die.
TLB solves this problem by using consistent hashing. See Wikipedia (http://en.wikipedia.org/wiki/Consistent_hashing) for the theory on consistent hashing.
The short summary is that consistent hashing minimizes the number of flows that are moved when a scale-out or a scale-in event happens. For example, if the number of paths increases from 5 to 6, then the maximum number of flows which are moved is 1/6 of the flows — the minimum number needed to re-distribute all the flows equally over 6 members. In contrast, with ECMP approximately 4/5 of the flows will be moved.
Thus, even with consistent hashing, some flows still get moved, which is why we said that TLB minimizes (but does not completely solve) the problem of stickiness.
Note that flow moves can be completely eliminated with a flow table — this is what OpenContrail does (see below). Normal routers cannot do this because they do forwarding packet-by-packet without a flow table, but Juniper routers with flow-awareness such as the SCG and the SRX can do it.
3 Load Balancing in the OpenContrail vRouter
Figure 4 shows load balancing in the OpenContrail vRouter.
Figure 4: Load Balancing in the OpenContrail vRouter
We need to do load balancing to spread the traffic from service Si equally across all instances of service Si+1 in the next step of the service chain.
In Figure 4 above, the firewall service to which the arrow points needs to spread the traffic equally across all three instances of the cache service which is the next service in the service chain.
Load balancing on the vRouter is based on the same mechanism as BGP multi-path. If there are multiple downstream instances of the next service in the service chain, then the vRouter will receive multiple XMPP routes towards the final destination. All the XMPP routes have the same destination prefix, but they have different Route Distinguishers (RDs) to keep them distinct, and they have different next-hops and MPLS labels to identify the different downstream service instances. This is illustrated in Figure 5 below.
Figure 5: “BGP Multi-Path” for Load Balancing on the vRouter
The vRouter needs to solve the same two problems which were described in the previous section, namely symmetry and stickiness. In this case we discuss the stickiness problem first.
The OpenContrail vRouter uses flow tables to solve the flow stickiness problem. The OpenContrail vRouter has a flow table which contains one entry for each active flow.
When the first packet of a flow arrives at the vRouter, there is no entry in the flow table yet. At this point, the vRouter performs an ECMP hash (using the “BGP multi-path” mechanism described above) to choose the downstream load balancing group member.
The vRouter then creates an entry in the flow table. The next-hop for the flow table entry contains the chosen downstream load balancing group member. Subsequent packets for the same flow don’t perform ECMP hashes anymore – they allow use the already chosen load balancing group member which was stored in the flow table.
As a result, the flow will never move, even when the number of load balancing group members changes as a result of scale-out or scale-in.
When the OpenContrail vRouter receives the first packet for a forward flow, it makes an initial ECMP decision and records that decision in the flow table to achieve stickiness as described above.
At the same time, the OpenContrail vRouter also creates an entry for the reverse flow to achieve symmetry. This is done as follows:
- The OpenContrail vRouter does a lookup for the source IP address of the payload (i.e. the inner IP header) in the forwarding table of the routing-instance. This results in set of one or more next-hops. It will be more than one next-hop if there is ECMP. All of these reverse next-hops will be overlay tunnels to the previous service in the service-chain.
- The OpenContrail vRouter then observes over which overlay tunnel the traffic was actually received (i.e. the outer IP header).
- If the tunnel over which the traffic actually arrived is a member of the ECMP set computed in set 1, then the OpenContrail vRouter also creates a reverse flow entry (in addition to the forward flow entry).
- If the traffic starts arriving over a different tunnel, the OpenContrail vRouter updates the reverse flow entry, as long as it continues to meet the criteria of being a member of the reverse ECMP set.
This process is conceptually similar to the Reverse Path Forwarding (RPF) check which is performance in multicast forwarding and in unicast RPF (uRPF).
A feature which has been requested but not yet implemented is the ability to “bleed” a service instance before it is taken out of service.
What this means is that the customer wants to be able to remove a service instance from the load balancing group without shutting down the virtual machine just yet.
No new flows must be assigned to the service instance. But all existing flows which already go through the service instance must continue to go through the service instance.
Eventually, all the existing flows will go away. When the last flow is gone (or when a time-out occurs) an event must be generated to remove the virtual machine.
This feature is not yet supported. It is easy to achieve in the data plane with the OpenContrail flow tables, but the work-flow has not yet been implemented in the control plane and management plane.
“Real” load balancer features
The OpenContrail vRouter is not a general purpose load balancer. “Real” load balancers have all sorts of advanced features such as liveness checks, load monitoring, application-aware load balancing etc. The OpenContrail vRouter will, over time, implement some of these features, for example the ability to check the liveness of virtual machines.
4 Orchestration and Dynamic Scale-Out
When you create a service instance in OpenContrail, you can specify the number of virtual machines for that service instance. In that sense, OpenContrail supports statically scaled-out services.
It is also possible to use the OpenContrail API to dynamically change the number of virtual machines for a service instance. In that sense, OpenContrail provides an API to implement dynamically scaled-out services.
However, monitoring the load of a service and scaling out the service when certain Key Performance Indicators (KPIs) are exceeded is not the job of the OpenContrail vRouter. This function (monitoring, dynamic scale-out, dynamic scale-in, and failure handling) is typically performed by a so-called orchestration system or orchestrator for short.
An orchestration system manages the life cycle of complex applications or complex Virtual Network Functions (VNFs) which consist of multiple virtual machines working together. The typical functions of an orchestrator include:
- Some sort of template language to describe the resources in such a complex application: virtual machines, virtual storage, virtual networks, virtual load balancers, virtual databases, etc.
- A mechanism to monitor the liveness of a virtual machine and to recover from failures by spinning up a new virtual machine.
- A mechanism to monitor the load on a virtual machine and the perform scale-out (or scale-in) when Keep Performance Indicators (KPIs) are exceeded. Often, there is an agent in the virtual machine to allow these KPIs to be application-aware (e.g. HTTP request latency for an Apache web server).
All the major public clouds offer orchestration and monitoring as a service. For example Amazon Web Services (AWS) offers CloudFormation and CloudWatch. OpenStack has Heat and Ceilometer for orchestrator and monitoring. OpenContrail is being integrated with third-party orchestration systems such as IBM Smart Cloud Orchestrator (SCO), Amdocs Network Function Virtualization Orchestrator (NFVO), and Scarl.
5 Load Balancing in a Virtual Machine
In section 3 we mentioned that the OpenContrail vRouter has several load balancing features, but cannot be considered a “real” load balancer.
There are several companies that offer real virtual load balancers. There are also some open source products such as HAProxy.
It is possible to run a load balancer in a virtual machine and include it in the service chain as shown in Figure 6 below.
Figure 6: Load Balancing in a Virtual Machine
The advantage of this method is that you can use a best-of-breed “real” load balancer.
The disadvantage of this method is that the load balancer itself may become a bottleneck. In that case you have to create multiple instances of the load balancer which introduces a chicken-and-egg problem: how to you spread the traffic over the multiple instances of the load balancer.
In practice, this is not so much a problem as you might think, because these load balancers (e.g. HAProxy) are very light-weight can a single load balancer virtual machine can handle many back-end virtual machines in the load balancing group. Also, for cloud based services Global Load Balancing (GLB) using the Domain Name Service (DNS) is used to spread the traffic across multiple load balancers (this technique does not apply to service chains).
6 Load Balancing in the Underlay
Everything we have discussed so far relates to load balancing in the overlay. There is a related but separate topic related to load balancing in the underlay.
Figure 7 below shows a scenario where there are multiple equal-cost paths between the anchor point and the first service in the service chain, or between one service instance and the next service instance in the service chain.
Figure 7: Load Balancing in the Underlay
The underlay uses normal multi-pathing techniques for load balancing. For layer-3 underlays, ECMP is used. For layer-2 underlays, various techniques are used including Multi-Chassis Link Aggregation (MC-LAG), Virtual Chassis (VC). Other techniques include various overlay flavors such as the Locator Identifier Separator Protocol (LISP), Transparent Interconnect of Lots of Links (TRILL), Provider Backbone Bridging (PBB) and proprietary protocols such as Cisco FabricPath.
Overlay networking introduces some complications. The complication is caused by the fact that all packets are encapsulation in an overlay tunnel encapsulation. There are multiple such overlay tunnel encapsulations, including MPLS-over-GRE, MPLS-over-UDP, VX-LAN, NV-GRE, STT, etc. Some of these encapsulations are friendly towards multi-pathing in the underlay and others are not.
One example of an overlay encapsulation which is not friendly towards multi-pathing in the underlay is MPLS-over-GRE which is shown in Figure 8 below.
Figure 8: GRE Overlay Encapsulation: Not Friendly towards Multi-Pathing in the Underlay
The problem with GRE encapsulation is that all the encapsulated packets have the same 5-tuple in the outer header. This means that all traffic for all virtual machines between a given pair of physical servers will be hashed to the same path. This could be avoided by using different GRE keys for each flow (“putting entropy in the GRE key”) but not many underlay switches support hashing on the GRE key.
This problem can be avoided by using a UDP-based encapsulation such as MPLS-over-UDP (shown in Figure 9 below) or VXLAN.
Figure 9: UDP-based Overlay Encapsulation: Friendly towards Multi-Pathing in the Underlay
UDP-based encapsulations are friendlier towards multi-pathing in the underlay because they “put entropy in the UDP source port”. The UDP header contains a hash of the header of the encapsulated packet (or frame) in the UDP source port. As a result, different overlay flows will have different UDP source ports in the underlay. Since the underlay typically hashes on the complete 5-tuple, this results in efficient multi-pathing.
A similar mechanism, using MPLS entropy labels, can use used with LSP transport tunnels, but at this stage it is not yet common to run MPLS in the switch fabric underlay: most switch fabrics are Ethernet-based or IP-based, not yet MPLS-based.
OpenContrail uses MPLS-over-UDP by default for all vRouter to vRouter traffic. OpenContrail uses VXLAN by default for all traffic to gateway switches. Both of these encapsulations have good support for multipath in the underlay.
OpenContrail can use MPLS-over-GRE for traffic to gateway routers which only support that encapsulation. This provides interoperability with existing routers.
OpenContrail uses capability negotiation techniques to discover which encapsulations each end-point of a tunnel supports. OpenContrail will automatically pick an encapsulation which is supported by both end-points. If there are multiple choices, OpenContrail will prefer an encapsulation which has good support for multi-pathing. There is no requirement that all tunnels in a given virtual network use the same encapsulation.