Documentation

GROUP BY clause

Use the GROUP BY clause to group query results by one or more specified tags and/or a specified time interval. GROUP BY requires an aggregate or selector function in the SELECT statement. GROUP BY supports regular expressions.

Note: You cannot use GROUP BY to group fields.

GROUP BY tags

GROUP BY <tag> groups query results by one or more specified tags.

Syntax

SELECT_clause FROM_clause [WHERE_clause] GROUP BY [* | <tag_key>[,<tag_key>]]
  • GROUP BY * - Groups results by all tags
  • GROUP BY <tag_key> - Groups results by a specific tag
  • GROUP BY <tag_key>,<tag_key> - Groups results by more than one tag. The order of the tag keys is irrelevant.
  • GROUP BY \regex\ - Groups results by tags that match the regular expression.

If the query includes a WHERE clause, the GROUP BY clause must appear after the WHERE clause.

Examples

Group query results by a single tag

Group query results by more than one tag

Group query results by all tags

Group query results by tags that start with l

GROUP BY time intervals

GROUP BY time() group query results by a user-specified time interval. When using aggregate or selector functions in the SELECT clause, the operation is applied to each interval.

Basic GROUP BY time() syntax

Syntax

SELECT <function>(<field_key>) FROM_clause WHERE <time_range> GROUP BY time(<time_interval>),[tag_key] [fill(<fill_option>)]

Basic GROUP BY time() queries require an InfluxQL function in the SELECT clause and a time range in the WHERE clause. Note that the GROUP BY clause must come after the WHERE clause.

time(time_interval)

The time_interval in the GROUP BY time() clause is a duration literal. It determines how InfluxDB groups query results over time. For example, a time_interval of 5m groups query results into five-minute time groups across the time range specified in the WHERE clause.

fill(<fill_option>)

fill(<fill_option>) is optional. It changes the value reported for time intervals with no data. See GROUP BY time intervals and fill() for more information.

Coverage:

Basic GROUP BY time() queries rely on the time_interval and InfluxDB’s preset time boundaries to determine the raw data included in each time interval and the timestamps returned by the query.

Examples of basic syntax

The examples below use the following subsample of the sample data:

SELECT "water_level","location" FROM "h2o_feet" WHERE time >= '2019-08-18T00:00:00Z' AND time <= '2019-08-18T00:30:00Z'

Output:

Name: h2o_feet

timewater_levellocation
2019-08-18T00:00:00Z8.5040000000coyote_creek
2019-08-18T00:00:00Z2.3520000000santa_monica
2019-08-18T00:06:00Z8.4190000000coyote_creek
2019-08-18T00:06:00Z2.3790000000santa_monica
2019-08-18T00:12:00Z8.3200000000coyote_creek
2019-08-18T00:12:00Z2.3430000000santa_monica
2019-08-18T00:18:00Z8.2250000000coyote_creek
2019-08-18T00:18:00Z2.3290000000santa_monica
2019-08-18T00:24:00Z8.1300000000coyote_creek
2019-08-18T00:24:00Z2.2640000000santa_monica
2019-08-18T00:30:00Z8.0120000000coyote_creek
2019-08-18T00:30:00Z2.2670000000santa_monica

Group query results into 12 minute intervals

Group query results into 12 minute intervals and by a tag key

Common issues with basic syntax

Unexpected timestamps and values in query results

With the basic syntax, InfluxDB relies on the GROUP BY time() interval and on the system’s preset time boundaries to determine the raw data included in each time interval and the timestamps returned by the query. In some cases, this can lead to unexpected results.

Example

Raw data:

SELECT "water_level" FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2019-08-18T00:00:00Z' AND time <= '2019-08-18T00:18:00Z'

Output:

name: h2o_feet

timewater_level
2019-08-18T00:00:00Z8.5040000000
2019-08-18T00:06:00Z8.4190000000
2019-08-18T00:12:00Z8.3200000000
2019-08-18T00:18:00Z8.2250000000

Query and results:

The following example queries a 12-minute time range and groups results into 12-minute time intervals, but it returns two results:

SELECT COUNT("water_level") FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2019-08-18T00:06:00Z' AND time < '2019-08-18T00:18:00Z' GROUP BY time(12m)

Output:

name: h2o_feet

timecount
2019-08-18T00:00:00Z1.0000000000
2019-08-18T00:12:00Z1.0000000000

Note: The timestamp in the first row of data occurs before the start of the queried time range.

Explanation:

InfluxDB uses preset round-number time boundaries for GROUP BY intervals that are independent of any time conditions in the WHERE clause. When it calculates the results, all returned data must occur within the query’s explicit time range but the GROUP BY intervals will be based on the preset time boundaries.

The table below shows the preset time boundary, the relevant GROUP BY time() interval, the points included, and the returned timestamp for each GROUP BY time() interval in the results.

Time Interval NumberPreset Time BoundaryGROUP BY time() IntervalPoints IncludedReturned Timestamp
1time >= 2019-08-18T00:00:00Z AND time < 2019-08-18T00:12:00Ztime >= 2019-08-18T00:06:00Z AND time < 2019-08-18T00:12:00Z8.0052019-08-18T00:00:00Z
2time >= 2019-08-12T00:12:00Z AND time < 2019-08-18T00:24:00Ztime >= 2019-08-12T00:12:00Z AND time < 2019-08-18T00:18:00Z7.8872019-08-18T00:12:00Z

The first preset 12-minute time boundary begins at 00:00 and ends just before 00:12. Only one raw point (8.005) falls both within the query’s first GROUP BY time() interval and in that first time boundary. Note that while the returned timestamp occurs before the start of the queried time range, the query result excludes data that occur before the queried time range.

The second preset 12-minute time boundary begins at 00:12 and ends just before 00:24. Only one raw point (7.887) falls both within the query’s second GROUP BY time() interval and in that second time boundary.

The advanced GROUP BY time() syntax allows users to shift the start time of the InfluxDB database’s preset time boundaries. It shifts forward the preset time boundaries by six minutes such that InfluxDB returns:

Output:

name: h2o_feet

timecount
2019-08-18T00:06:00Z2

Advanced GROUP BY time() syntax

Syntax

SELECT <function>(<field_key>) FROM_clause WHERE <time_range> GROUP BY time(<time_interval>,<offset_interval>),[tag_key] [fill(<fill_option>)]

Advanced GROUP BY time() queries require an InfluxQL function in the SELECT clause and a time range in the WHERE clause). Note that the GROUP BY clause must come after the WHERE clause.

time(time_interval,offset_interval)

See the Basic GROUP BY time() Syntax for details on the time_interval.

The offset_interval is a duration literal. It shifts forward or back the InfluxDB database’s preset time boundaries. The offset_interval can be positive or negative.

fill(<fill_option>)

fill(<fill_option>) is optional. It changes the value reported for time intervals with no data. See GROUP BY time intervals and fill() for more information.

Coverage:

Advanced GROUP BY time() queries rely on the time_interval, the offset_interval , and on the InfluxDB database’s preset time boundaries to determine the raw data included in each time interval and the timestamps returned by the query.

Examples of advanced syntax

The examples below use the following subsample of the sample data:

SELECT "water_level" FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2019-08-18T00:00:00Z' AND time <= '2019-08-18T00:54:00Z'

Output:

Name: h2o_feet

timewater_level
2019-08-18T00:00:00Z8.5040000000
2019-08-18T00:06:00Z8.4190000000
2019-08-18T00:12:00Z8.3200000000
2019-08-18T00:18:00Z8.2250000000
2019-08-18T00:24:00Z8.1300000000
2019-08-18T00:30:00Z8.0120000000
2019-08-18T00:36:00Z7.8940000000
2019-08-18T00:42:00Z7.7720000000
2019-08-18T00:48:00Z7.6380000000
2019-08-18T00:54:00Z7.5100000000

Group query results into 18 minute intervals and shift the preset time boundaries forward

Group query results into 12 minute intervals and shift the preset time boundaries back

Group query results into 12 minute intervals and shift the preset time boundaries forward

GROUP BY time intervals and fill()

fill() changes the value reported for time intervals with no data.

Syntax

SELECT <function>(<field_key>) FROM_clause WHERE <time_range> GROUP BY time(time_interval,[<offset_interval>])[,tag_key] [fill(<fill_option>)]

By default, a GROUP BY time() interval with no data reports null as its value in the output column. fill() changes the value reported for time intervals with no data. Note that fill() must go at the end of the GROUP BY clause if you’re GROUP(ing) BY several things (for example, both tags and a time interval).

fill_option
  • Any numerical value - Reports the given numerical value for time intervals with no data.
  • linear - Reports the results of linear interpolation for time intervals with no data.
  • none - Reports no timestamp and no value for time intervals with no data.
  • null - Reports null for time intervals with no data but returns a timestamp. This is the same as the default behavior.
  • previous - Reports the value from the previous time interval for time intervals with no data.

Examples

Without fill(100):

SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) 

Output:

Name: h2o_quality

timemean
2019-08-19T08:40:00Z68.0000000000
2019-08-19T08:45:00Z29.0000000000
2019-08-19T08:50:00Z47.0000000000
2019-08-19T08:55:00Z
2019-08-19T09:00:00Z84.0000000000
2019-08-19T09:05:00Z0.0000000000
2019-08-19T09:10:00Z41.0000000000
2019-08-19T09:15:00Z13.0000000000
2019-08-19T09:20:00Z9.0000000000
2019-08-19T09:25:00Z
2019-08-19T09:30:00Z6.0000000000

With fill(100):

SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) fill(100)

Output:

Name: h2o_quality

timemean
2019-08-19T08:40:00Z68.0000000000
2019-08-19T08:45:00Z29.0000000000
2019-08-19T08:50:00Z47.0000000000
2019-08-19T08:55:00Z100.0000000000
2019-08-19T09:00:00Z84.0000000000
2019-08-19T09:05:00Z0.0000000000
2019-08-19T09:10:00Z41.0000000000
2019-08-19T09:15:00Z13.0000000000
2019-08-19T09:20:00Z9.0000000000
2019-08-19T09:25:00Z100.0000000000
2019-08-19T09:30:00Z6.0000000000

fill(100) changes the value reported for the time interval with no data to 100.

Without fill(linear):

SELECT MEAN("tadpoles") FROM "pond" WHERE time >= '2019-11-11T21:00:00Z' AND time <= '2019-11-11T22:06:00Z' GROUP BY time(12m)

Output:

Name: pond

timemean
2019-11-11T21:00:00Z1
2019-11-11T21:12:00Z
2019-11-11T21:24:00Z3
2019-11-11T21:36:00Z
2019-11-11T21:48:00Z
2019-11-11T22:00:00Z6

With fill(linear):

SELECT MEAN("tadpoles") FROM "pond" WHERE time >= '2019-11-11T21:00:00Z' AND time <= '2019-11-11T22:06:00Z' GROUP BY time(12m) fill(linear)

Output:

Name: pond

timemean
2019-11-11T21:00:00Z1
2019-11-11T21:12:00Z2
2019-11-11T21:24:00Z3
2019-11-11T21:36:00Z4
2019-11-11T21:48:00Z5
2019-11-11T22:00:00Z6

fill(linear) changes the value reported for the time interval with no data to the results of linear interpolation.

Note: The data in this example is not in the noaa database.

Without fill(none):

SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m)

Output:

Name: h2o_quality

timemean
2019-08-19T08:40:00Z68.0000000000
019-08-19T08:45:00Z29.0000000000
2019-08-19T08:50:00Z47.0000000000
2019-08-19T08:55:00Z
2019-08-19T09:00:00Z84.0000000000
2019-08-19T09:05:00Z0.0000000000
2019-08-19T09:10:00Z41.0000000000
2019-08-19T09:15:00Z13.0000000000
2019-08-19T09:20:00Z9.0000000000
2019-08-19T09:25:00Z
2019-08-19T09:30:00Z6.0000000000

With fill(none):

SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) fill(none)

Output:

Name: h2o_quality

timemean
2019-08-19T08:40:00Z68.0000000000
2019-08-19T08:45:00Z29.0000000000
2019-08-19T08:50:00Z47.0000000000
2019-08-19T09:00:00Z84.0000000000
2019-08-19T09:05:00Z0.0000000000
2019-08-19T09:10:00Z41.0000000000
2019-08-19T09:15:00Z13.0000000000
2019-08-19T09:20:00Z9.0000000000
2019-08-19T09:30:00Z6.0000000000

`fill(none)` reports no value and no timestamp for the time interval with no data.

Without fill(null):

SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m)

Output:

Name: h2o_quality

timemean
2019-08-19T08:40:00Z68.0000000000
019-08-19T08:45:00Z29.0000000000
2019-08-19T08:50:00Z47.0000000000
2019-08-19T08:55:00Z
2019-08-19T09:00:00Z84.0000000000
2019-08-19T09:05:00Z0.0000000000
2019-08-19T09:10:00Z41.0000000000
2019-08-19T09:15:00Z13.0000000000
2019-08-19T09:20:00Z9.0000000000
2019-08-19T09:25:00Z
2019-08-19T09:30:00Z6.0000000000

With fill(null):

SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) fill(null)

Output:

Name: h2o_quality

timemean
2019-08-19T08:40:00Z68.0000000000
019-08-19T08:45:00Z29.0000000000
2019-08-19T08:50:00Z47.0000000000
2019-08-19T08:55:00Znull
2019-08-19T09:00:00Z84.0000000000
2019-08-19T09:05:00Z0.0000000000
2019-08-19T09:10:00Z41.0000000000
2019-08-19T09:15:00Z13.0000000000
2019-08-19T09:20:00Z9.0000000000
2019-08-19T09:25:00Znull
2019-08-19T09:30:00Z6.0000000000

fill(null) reports null as the value for the time interval with no data. That result matches the result of the query without fill(null).

Without fill(previous):

 SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) 

Output:

Name: h2o_quality

timemean
2019-08-19T08:40:00Z68.0000000000
019-08-19T08:45:00Z29.0000000000
2019-08-19T08:50:00Z47.0000000000
2019-08-19T08:55:00Z
2019-08-19T09:00:00Z84.0000000000
2019-08-19T09:05:00Z0.0000000000
2019-08-19T09:10:00Z41.0000000000
2019-08-19T09:15:00Z13.0000000000
2019-08-19T09:20:00Z9.0000000000
2019-08-19T09:25:00Z

With fill(previous):

SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) fill(previous)

Output:

Name: h2o_quality

timemean
2019-08-19T08:40:00Z68.0000000000
019-08-19T08:45:00Z29.0000000000
2019-08-19T08:50:00Z47.0000000000
2019-08-19T08:55:00Z47.0000000000
2019-08-19T09:00:00Z84.0000000000
2019-08-19T09:05:00Z0.0000000000
2019-08-19T09:10:00Z41.0000000000
2019-08-19T09:15:00Z13.0000000000
2019-08-19T09:20:00Z9.0000000000
2019-08-19T09:25:00Z9.0000000000

fill(previous) changes the value reported for the time interval with no data to 3.235, the value from the previous time interval.

Common issues with fill()

Queries with no data in the queried time range

Currently, queries ignore fill() if no data exists in the queried time range. This is the expected behavior. An open feature request on GitHub proposes that fill() should force a return of values even if the queried time range covers no data.

Example

The following query returns no data because water_level has no points within the queried time range. Note that fill(800) has no effect on the query results.

SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location" = 'coyote_creek' AND time >= '2019-09-18T22:00:00Z' AND time <= '2019-09-18T22:18:00Z' GROUP BY time(12m) fill(800)
> no results
Queries with fill(previous) when the previous result is outside the queried time range

fill(previous) doesn’t fill the result for a time interval if the previous value is outside the query’s time range.

Example

The following example queries the time range between 2019-09-18T16:24:00Z and 2019-09-18T16:54:00Z. Note that fill(previous) fills the result for 2019-09-18T16:36:00Z with the result from 2019-09-18T16:24:00Z.

SELECT MAX("water_level") FROM "h2o_feet" WHERE location = 'coyote_creek' AND time >= '2019-09-18T16:24:00Z' AND time <= '2019-09-18T16:54:00Z' GROUP BY time(12m) fill(previous)

Output:

Name: h2o_feet

timemax
2019-09-18T16:24:00Z3.235
2019-09-18T16:36:00Z3.235
2019-09-18T16:48:00Z4

The next example queries the time range between 2019-09-18T16:36:00Z and 2019-09-18T16:54:00Z. Note that fill(previous) doesn’t fill the result for 2019-09-18T16:36:00Z with the result from 2019-09-18T16:24:00Z; the result for 2019-09-18T16:24:00Z is outside the query’s shorter time range.

SELECT MAX("water_level") FROM "h2o_feet" WHERE location = 'coyote_creek' AND time >= '2019-09-18T16:36:00Z' AND time <= '2019-09-18T16:54:00Z' GROUP BY time(12m) fill(previous)

Output:

Name: h2o_feet

timemax
2019-09-18T16:36:00Z
2019-09-18T16:48:00Z4
fill(linear) when the previous or following result is outside the queried time range

fill(linear) doesn’t fill the result for a time interval with no data if the previous result or the following result is outside the queried time range.

Example

The following example queries the time range between 2019-11-11T21:24:00Z and 2019-11-11T22:06:00Z. Note that fill(linear) fills the results for the 2019-11-11T21:36:00Z time interval and the 2019-11-11T21:48:00Z time interval using the values from the 2019-11-11T21:24:00Z time interval and the 2019-11-11T22:00:00Z time interval.

SELECT MEAN("tadpoles") FROM "pond" WHERE time > '2019-11-11T21:24:00Z' AND time <= '2019-11-11T22:06:00Z' GROUP BY time(12m) fill(linear)

Output:

Name: pond

timemean
2019-11-11T21:24:00Z3
2019-11-11T21:36:00Z4
2019-11-11T21:48:00Z5
2019-11-11T22:00:00Z6

The next query shortens the time range in the previous query. It now covers the time between 2019-11-11T21:36:00Z and 2019-11-11T22:06:00Z. Note that fill() previous doesn’t fill the results for the 2019-11-11T21:36:00Z time interval and the 2019-11-11T21:48:00Z time interval; the result for 2019-11-11T21:24:00Z is outside the query’s shorter time range and InfluxDB cannot perform the linear interpolation.

SELECT MEAN("tadpoles") FROM "pond" WHERE time >= '2019-11-11T21:36:00Z' AND time <= '2019-11-11T22:06:00Z' GROUP BY time(12m) fill(linear)

Output:

Name: pond

timemean
2019-11-11T21:36:00Z
2019-11-11T21:48:00Z
2019-11-11T22:00:00Z6

Note: The data in Issue 3 are not in NOAA database. We had to create a dataset with less regular data to work with fill(linear).


Was this page helpful?

Thank you for your feedback!


The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Read more

InfluxDB v3 enhancements and InfluxDB Clustered is now generally available

New capabilities, including faster query performance and management tooling advance the InfluxDB v3 product line. InfluxDB Clustered is now generally available.

InfluxDB v3 performance and features

The InfluxDB v3 product line has seen significant enhancements in query performance and has made new management tooling available. These enhancements include an operational dashboard to monitor the health of your InfluxDB cluster, single sign-on (SSO) support in InfluxDB Cloud Dedicated, and new management APIs for tokens and databases.

Learn about the new v3 enhancements


InfluxDB Clustered general availability

InfluxDB Clustered is now generally available and gives you the power of InfluxDB v3 in your self-managed stack.

Talk to us about InfluxDB Clustered