Skip to main content

Contrail Alerts

By October 24, 2015Analytics, Uncategorized

OpenContrail networking provides network virtualization to data center applications using a layered, horizontally scalable software system.  We have the abstractions in place to present the operational state of this system (Operational State in the OpenContrail system: UVE – User Visible Entities through Analytics API). The system is architected to be as simple as possible to operate for the functionality it delivers. An important element of this is Contrail Alerts – in addition to providing detailed operational state in a easy-to-navigate way, we also need to clearly highlight unusual conditions that may require more urgent administrator attention and action.

We provide Alerts on a per-UVE basis. Contrail Analytics will raise (or clear) instances of these alerts (alarms) using python-coded “rules” that examine the contents of the UVE and the object’s configuration. Some rules will be built-in. Others can be added using python Entry-Point based plugins.

See Contrail Alerts features in a demo:

[video_lightbox_youtube video_id=”fUgP2KtAy9A” width=”720″ height=”540″ auto_thumb=”1″]

Contrail Analytics APIs for Alerts

There is an API to get the list of supported alerts as follows:

contrail_alerts_blogpost_image1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Lets look at an example.

We have a system configured with 2 BGP peers as gateways, but those gateways themselves have not been configured with this system’s control node yet. Based on the state of these peers, we raise an alarm against the control-node UVE.

We can look at the alarms being reported on this system via the following API:

GET http://<analytics-ip>:<rest-api-port>/analytics/alarms

contrail_alerts_blogpost_image2

The API reports the type of alarm, severity, a description which lists the reason why it exists, whether its been acknowledged yet, and the timestamp. We provide an API for acknowledging alarms as follows:

POST http://:/analytics/alarms/acknowledge

Body: {“table”: , “name”: , “type”: , “token”: }

When the alarm condition is resolved, the alarm will be deleted automatically, whether or not it has been acknowledged.
The alarm is also shown along with the rest of the UVE if the UVE GET API is used:

GET http://:/analytics/uves/control-node/

In addition to these GET APIs, a streaming interface is also available for both UVEs and Alarms. That interface is described in detailed in the Contrail Analytics Streaming API blogpost.

Alarm Processing and Plugins

New Alerts can be added to the Contrail Analytics by installing python plugins onto the Analyics Nodes. Consistent hashing techniques are used to distribute alarm processing among all functioning Analytics Nodes (the hash is based on the UVE Key). So, the python plugin for an Alert must be installed on each Analytics Node.
Let us look at the plugin for the alert used in the example above.

This module plugin is here:
controller/src/opserver/plugins/alarm_bgp_connectivity/

We install this plugin as follows:(from alarm_bgp_connectivity/setup.py)

#
# Copyright (c) 2013 Juniper Networks, Inc. All rights reserved.
#

from setuptools import setup, find_packages

setup(
    name='alarm_bgp_connectivity',
    version='0.1dev',
    packages=find_packages(),
    entry_points = {
        'contrail.analytics.alarms': [
            'ObjectBgpRouter = alarm_bgp_connectivity.main:BgpConnectivity',
        ],
    },
    zip_safe=False,
    long_description="BGPConnectivity alarm"
)

“ObjectBGPRouter” represents the control-node UVE.
See UVE_MAP in controller/src/analytics/viz.sandesh

The implementation is as follows (from alarm_bgp_connectivity/main.py)

from  opserver.plugins.alarm_base import AlarmBase

class BgpConnectivity(AlarmBase):
    """Not enough BGP peers are up in BgpRouterState.num_up_bgp_peer"""

    def __call__(self, uve_key, uve_data):
        err_list = []
        if not uve_data.has_key("BgpRouterState"):
            return self.__class__.__name__, AlarmBase.SYS_WARN, err_list

        ust = uve_data["BgpRouterState"]

        l,r = ("num_up_bgp_peer","num_bgp_peer")
        cm = True
        if not ust.has_key(l):
            err_list.append(("BgpRouterState.%s != None" % l,"None"))
            cm = False
        if not ust.has_key(r):
            err_list.append(("BgpRouterState.%s != None" % r,"None"))
            cm = False
        if cm:
            if not ust[l] == ust[r]:
                err_list.append(("BgpRouterState.%s != BgpRouterState.%s" \
                        % (l,r), "%s != %s" % (str(ust[l]), str(ust[r]))))

        return self.__class__.__name__, AlarmBase.SYS_WARN, err_list

This plugin code is called anytime a control-node UVE changes. It can examine the contents of the UVE can decide whether an alarm should be raise or not. In this case, we compare the “BgpRouterState.num_bgp_peer” attribute of the UVE with the “BgpRouterState.num_up_bgp_peer” attribute.

Contrail UI

A dashboard listing all Alarms present in the system is also available in Contrail UI as follows:
contrail_alerts_blogpost_image3