Creating Chronograf alert rules

Chronograf provides a user interface for Kapacitor, InfluxData’s processing framework for creating alerts, ETL jobs (running extract, transform, load), and detecting anomalies in your data. Chronograf alert rules correspond to Kapacitor tasks that trigger alerts whenever certain conditions are met. Behind the scenes, these tasks are stored as TICKscripts that can be edited manually or through Chronograf. Common alerting use cases that can be managed using Chronograf include:

  • Thresholds with static ceilings, floors, and ranges.
  • Relative thresholds based on unit or percentage changes.
  • Deadman switches.

Complex alerts and other tasks can be defined directly in Kapacitor as TICKscripts, but can be viewed and managed within Chronograf.

This guide walks through creating a Chronograf alert rule that sends an alert message to an existing Slack channel whenever your idle CPU usage crosses the 80% threshold.

Requirements

Getting started with Chronograf offers step-by-step instructions for each of the following requirements:

Configuring Chronograf alert rules

Navigate to the Manage Tasks page under Alerting in the left navigation, then click + Build Alert Rule in the top right corner.

Navigate to Manage Tasks

The Manage Tasks page is used to create and edit your Chronograf alert rules. The steps below guide you through the process of creating a Chronograf alert rule.

Empty Rule Configuration

Step 1: Name the alert rule

Under Name this Alert Rule provide a name for the alert. For this example, use “Idle CPU Usage” as your alert name.

Step 2: Select the alert type

Choose from three alert types under the Alert Types section of the Rule Configuration page:

Threshold
Alert if data crosses a boundary.

Relative
Alert if data changes relative to data in a different time range.

Deadman
Alert if InfluxDB receives no relevant data for a specified time duration.

For this example, select the Threshold alert type.

Step 3: Select the time series data

Choose the time series data you want the Chronograf alert rule to use. Navigate through databases, measurements, fields, and tags to select the relevant data.

In this example, select the telegraf database, the autogen retention policy, the cpu measurement, and the usage_idle field.

Select your data

Step 4: Define the rule condition

Define the threshold condition. Condition options are determined by the alert type. For this example, the alert conditions are if usage_idle is less than 80.

Create a condition

The graph shows a preview of the relevant data and the threshold number. By default, the graph shows data from the past 15 minutes. Adjusting the graph’s time range is helpful when determining a reasonable threshold number based on your data.

We set the threshold number to 80 for demonstration purposes. Setting the threshold for idle CPU usage to a high number ensures that we’ll be able to see the alert in action. In practice, you’d set the threshold number to better match the patterns in your data and your alerting needs.

Step 5: Select and configure the alert handler

The Alert Handler section determines where the system sends the alert (the event handler) Chronograf supports several event handlers. Each handler has unique configurable options.

For this example, choose the slack alert handler and enter the desired options.

Select the alert handler

Multiple alert handlers can be added to send alerts to multiple endpoints.

Step 6: Configure the alert message

The alert message is the text that accompanies an alert. Alert messages are templates that have access to alert data. Available data templates appear below the message text field. As you type your alert message, clicking the data templates will insert them at end of whatever text has been entered.

In this example, use the alert message, Your idle CPU usage is {{.Level}} at {{ index .Fields "value" }}..

Specify event handler and alert message

View the Kapacitor documentation for more information about message template data.

Step 7: Save the alert rule

Click Save Rule in the top right corner and navigate to the Manage Tasks page to see your rule. Notice that you can easily enable and disable the rule by toggling the checkbox in the Enabled column.

See the alert rule

Next, move on to the section below to experience your alert rule in action.

Viewing alerts in practice

Step 1: Create some load on your system

The purpose of this step is to generate enough load on your system to trigger an alert. More specifically, your idle CPU usage must dip below 80%. On the machine that’s running Telegraf, enter the following command in the terminal to start some while loops:

while true; do i=0; done

Let it run for a few seconds or minutes before terminating it. On most systems, kill the script by using Ctrl+C.

Step 2: View the alerts

Go to the Slack channel that you specified in the previous section. In this example, it’s the #chronocats channel.

Assuming the first step was successful, #ohnos should reveal at least two alert messages:

  • The first alert message indicates that your idle CPU usage was CRITICAL, meaning it dipped below 80%.
  • The second alert message indicates that your idle CPU usage returned to an OK level of 80% or above.

See the alerts

You can also see alerts on the Alert History page available under Alerting in the left navigation.

Chronograf alert history

That’s it! You’ve successfully used Chronograf to configure an alert rule to monitor your idle CPU usage and send notifications to Slack.

This documentation is open source. See a typo? Please, open an issue.


Need help getting up and running? Get Support