Documentation

Grok input data format

Use the grok data format to parse line-delimited data using a regular expression-like language.

For an introduction to grok patterns, see Grok Basics in the Logstash documentation. The grok parser uses a slightly modified version of logstash grok patterns, using the format:

%{<capture_syntax>[:<semantic_name>][:<modifier>]}

The capture_syntax defines the grok pattern used to parse the input line and the semantic_name is used to name the field or tag. The extension modifier controls the data type that the parsed item is converted to or other special handling.

By default all named captures are converted into string fields. If a pattern does not have a semantic name it will not be captured. Timestamp modifiers can be used to convert captures to the timestamp of the parsed metric. If no timestamp is parsed the metric will be created using the current time.

You must capture at least one field per line.

  • Available modifiers:
    • string (default if nothing is specified)
    • int
    • float
    • duration (ie, 5.23ms gets converted to int nanoseconds)
    • tag (converts the field into a tag)
    • drop (drops the field completely)
    • measurement (use the matched text as the measurement name)
  • Timestamp modifiers:
    • ts (This will auto-learn the timestamp format)
    • ts-ansic (“Mon Jan _2 15:04:05 2006”)
    • ts-unix (“Mon Jan _2 15:04:05 MST 2006”)
    • ts-ruby (“Mon Jan 02 15:04:05 -0700 2006”)
    • ts-rfc822 (“02 Jan 06 15:04 MST”)
    • ts-rfc822z (“02 Jan 06 15:04 -0700”)
    • ts-rfc850 (“Monday, 02-Jan-06 15:04:05 MST”)
    • ts-rfc1123 (“Mon, 02 Jan 2006 15:04:05 MST”)
    • ts-rfc1123z (“Mon, 02 Jan 2006 15:04:05 -0700”)
    • ts-rfc3339 (“2006-01-02T15:04:05Z07:00”)
    • ts-rfc3339nano (“2006-01-02T15:04:05.999999999Z07:00”)
    • ts-httpd (“02/Jan/2006:15:04:05 -0700”)
    • ts-epoch (seconds since unix epoch, may contain decimal)
    • ts-epochnano (nanoseconds since unix epoch)
    • ts-epochmilli (milliseconds since unix epoch)
    • ts-syslog (“Jan 02 15:04:05”, parsed time is set to the current year)
    • ts-“CUSTOM”

CUSTOM time layouts must be within quotes and be the representation of the “reference time”, which is Mon Jan 2 15:04:05 -0700 MST 2006. To match a comma decimal point you can use a period. For example %{TIMESTAMP:timestamp:ts-"2006-01-02 15:04:05.000"} can be used to match "2018-01-02 15:04:05,000" To match a comma decimal point you can use a period in the pattern string. See https://golang.org/pkg/time/#Parse for more details.

Telegraf has many of its own built-in patterns, as well as support for most of Logstash’s core patterns. Golang regular expressions do not support lookahead or lookbehind. Logstash patterns that depend on these aren’t supported.

For help building and testing patterns, see tips for creating patterns.

Configuration

[[inputs.file]]
  ## Files to parse each interval.
  ## These accept standard unix glob matching rules, but with the addition of
  ## ** as a "super asterisk". ie:
  ##   /var/log/**.log     -> recursively find all .log files in /var/log
  ##   /var/log/*/*.log    -> find all .log files with a parent dir in /var/log
  ##   /var/log/apache.log -> only tail the apache log file
  files = ["/var/log/apache/access.log"]

  ## The dataformat to be read from files
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  data_format = "grok"

  ## This is a list of patterns to check the given log file(s) for.
  ## Note that adding patterns here increases processing time. The most
  ## efficient configuration is to have one pattern.
  ## Other common built-in patterns are:
  ##   %{COMMON_LOG_FORMAT}   (plain apache & nginx access logs)
  ##   %{COMBINED_LOG_FORMAT} (access logs + referrer & agent)
  grok_patterns = ["%{COMBINED_LOG_FORMAT}"]

  ## Full path(s) to custom pattern files.
  grok_custom_pattern_files = []

  ## Custom patterns can also be defined here. Put one pattern per line.
  grok_custom_patterns = '''
  '''

  ## Timezone allows you to provide an override for timestamps that
  ## don't already include an offset
  ## e.g. 04/06/2016 12:41:45 data one two 5.43µs
  ##
  ## Default: "" which renders UTC
  ## Options are as follows:
  ##   1. Local             -- interpret based on machine localtime
  ##   2. "Canada/Eastern"  -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
  ##   3. UTC               -- or blank/unspecified, will return timestamp in UTC
  grok_timezone = "Canada/Eastern"

  ## When set to "disable" timestamp will not incremented if there is a
  ## duplicate.
  # grok_unique_timestamp = "auto"

  ## Enable multiline messages to be processed.
  # grok_multiline = false

Timestamp Examples

This example input and config parses a file using a custom timestamp conversion:

2017-02-21 13:10:34 value=42
[[inputs.file]]
  grok_patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} value=%{NUMBER:value:int}']

This example input and config parses a file using a timestamp in unix time:

1466004605 value=42
1466004605.123456789 value=42
[[inputs.file]]
  grok_patterns = ['%{NUMBER:timestamp:ts-epoch} value=%{NUMBER:value:int}']

This example parses a file using a built-in conversion and a custom pattern:

Wed Apr 12 13:10:34 PST 2017 value=42
[[inputs.file]]
  grok_patterns = ["%{TS_UNIX:timestamp:ts-unix} value=%{NUMBER:value:int}"]
  grok_custom_patterns = '''
    TS_UNIX %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{TZ} %{YEAR}
  '''

This example input and config parses a file using a custom timestamp conversion that doesn’t match any specific standard:

21/02/2017 13:10:34 value=42
[[inputs.file]]
  grok_patterns = ['%{MY_TIMESTAMP:timestamp:ts-"02/01/2006 15:04:05"} value=%{NUMBER:value:int}']

  grok_custom_patterns = '''
    MY_TIMESTAMP (?:\d{2}.\d{2}.\d{4} \d{2}:\d{2}:\d{2})
  '''

For cases where the timestamp itself is without offset, the timezone config var is available to denote an offset. By default (with timezone either omit, blank or set to "UTC"), the times are processed as if in the UTC timezone. If specified as timezone = "Local", the timestamp will be processed based on the current machine timezone configuration. Lastly, if using a timezone from the list of Unix timezones, grok will offset the timestamp accordingly.

TOML Escaping

When saving patterns to the configuration file, keep in mind the different TOML string types and the escaping rules for each. These escaping rules must be applied in addition to the escaping required by the grok syntax. Using the Multi-line line literal syntax with ''' may be useful.

The following config examples will parse this input file:

|42|\uD83D\uDC2F|'telegraf'|

Since | is a special character in the grok language, we must escape it to get a literal |. With a basic TOML string, special characters such as backslash must be escaped, requiring us to escape the backslash a second time.

[[inputs.file]]
  grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
  grok_custom_patterns = "UNICODE_ESCAPE (?:\\\\u[0-9A-F]{4})+"

We cannot use a literal TOML string for the pattern, because we cannot match a ' within it. However, it works well for the custom pattern.

[[inputs.file]]
  grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
  grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'

A multi-line literal string allows us to encode the pattern:

[[inputs.file]]
  grok_patterns = ['''
    \|%{NUMBER:value:int}\|%{UNICODE_ESCAPE:escape}\|'%{WORD:name}'\|
  ''']
  grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'

Tips for creating patterns

Complex patterns can be difficult to read and write. For help building and debugging grok patterns, see the following tools:

We recommend the following steps for building and testing a new pattern with Telegraf and your data:

  1. In your Telegraf configuration, do the following to help you isolate and view the captured metrics:

    • Configure a file output that writes to stdout:

      [[outputs.file]]
        files = ["stdout"]
      
    • Disable other outputs while testing.

    Keep in mind that the file output will only print once per flush_interval.

  2. For the input, start with a sample file that contains a single line of your data, and then remove all but the first token or piece of the line.

  3. In your Telegraf configuration, add the section of your pattern that matches the piece of data from the previous step.

  4. Run Telegraf and verify that the metric is parsed successfully.

  5. If successful, add the next token to the data file, update the pattern configuration in Telegraf, and then retest.

  6. Continue one token at a time until the entire line is successfully parsed.

Performance

Performance depends heavily on the regular expressions that you use, but there are a few techniques that can help:

  • Avoid using patterns such as %{DATA} that will always match.

  • If possible, add ^ and $ anchors to your pattern:

    [[inputs.file]]
      grok_patterns = ["^%{COMBINED_LOG_FORMAT}$"]
    

Was this page helpful?

Thank you for your feedback!


The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Read more

InfluxDB 3 Open Source Now in Public Alpha

InfluxDB 3 Open Source is now available for alpha testing, licensed under MIT or Apache 2 licensing.

We are releasing two products as part of the alpha.

InfluxDB 3 Core, is our new open source product. It is a recent-data engine for time series and event data. InfluxDB 3 Enterprise is a commercial version that builds on Core’s foundation, adding historical query capability, read replicas, high availability, scalability, and fine-grained security.

For more information on how to get started, check out: