Skip to main content

Egress Selection

By February 26, 2014Uncategorized

A very creative use of BGP L3VPN technology is the use of multiple routing table topologies in order to select the egress point for different types of traffic by service providers. Over time several network architects have explained to me the when and how to apply this technique.

One example that comes to mind was a design used by a service provider in Europe with several gaming companies as customers. While any reasonable ISP core network is usually uncontested, peering points often experience congestion. Part of it is difficulty in getting capacity, some is the fact that smaller outfits run with very little wiggle room in terms of capital and fail to update their links on time and it does seem that at least a decent amount of it is intentional. ISPs both cooperate and compete and they often put pressure on each other by intentionally delaying the upgrade of peering capacity.

Gaming is all about latency. Web browsing and file transfers which consists of the bulk of the traffic are reasonably latency tolerant. But gaming applications are very latency sensitive and gamers are willing to spend money online. Thus if you do carry gaming traffic in your network it is desirable to be able to steer this traffic through the shortest path and make sure that bulk traffic is sometimes steered across longer paths in order to avoid congesting links.

For instance, if one peers with network A in points P1, P2 and P3 and P1 is getting congested it makes sense to steer bulk traffic that would otherwise go to P1 to points P2 and P3. The most scalable way to accomplish this is to have an MPLS core and steer the traffic at ingress. This requires having multiple routing tables at the ingress router and the ability to use FBF to select what types of traffic to assign to each routing table.

BGP L3VPN is the ideal technology for this problem. Using L3VPN it is possible to export all the routing tables from the peers, before route selection in the main routing table. In the ingress point, multiple topologies can be constructed using this original routing information.

As an example, assume that the ISP has as potential congested peers CP1 and CP2. In the peering gateways that face these peers one would place the peering sessions on a VRF (specific to the peer) and route leak the BGP routes into inet.0 by using a rib-group. Note that it is possible filter the routing information from being installed in the forwarding table in order to conserve forwarding memory. The next-hop advertised by BGP L3VPN is an MPLS next-hop by default; thus the IP forwarding information is not required for egress bound traffic.

In gateways that act as ingress points for traffic (which may overlap with the ones that peer with congested peers), one would configure a bulk VRF table that imports routes from congested peers. By manipulating the local-pref attribute, it is possible to make the congested egress points less preferable overriding the IGP decision. An FBF rule can direct bulk traffic to this routing-instance, with a default route of “next-table inet.0″ in order to fallback to standard route selection for peers that are not subject to special treatment.

Over time I’ve seen a few variants of this design. Some ISPs allow some of their customers to perform upstream selection and always traverse one of their upstreams which they believe provides better service even if a preferred route is available from another peer.

One interesting application of OpenContrail is that it can allow a network operator to apply the same technique for an application, rather than a customer circuit. With OpenContrail it is possible to place an application (running in either bare-metal or a virtual machine) into a VRF. This can be, for instance, the front-end that is responsible to generate video traffic or gaming updates.

While OpenContrail doesn’t have the same policy language capability available in JunOS, since it is open source, the control plane code can be customized by anyone in order to perform a specialized path selection decision that satisfies a particular application.

An OpenContrail deployment ressembles an L3VPN PE/gateway router. Typically 2 servers (for redundancy) run the control plane, which can interoperate directly with L3VPN capable routers at the network edge. These control plane servers can typically control up-to 2k compute servers running applications (virtualized or not). Encapsulation can be MPLS over GRE end-to-end or MPLS over GRE within the data-center followed by MPLS over MPLS on the WAN, by using a pair of data-center gateways.

By reusing BGP L3VPNs OpenContrail not only reuses technology that has been “battle tested” in large deployments as well as a lot of network design tricks that have been discovered over time.