GROUP BY clause
Use the GROUP BY
clause to group query results by one or more specified tags and/or a specified time interval. GROUP BY
requires an aggregate or selector function in the SELECT
statement. GROUP BY
supports regular expressions.
Note: You cannot use GROUP BY
to group fields.
GROUP BY tags
GROUP BY <tag>
groups query results by one or more specified tags.
Syntax
SELECT_clause FROM_clause [WHERE_clause] GROUP BY [* | <tag_key>[,<tag_key>]]
GROUP BY *
- Groups results by all tagsGROUP BY <tag_key>
- Groups results by a specific tagGROUP BY <tag_key>,<tag_key>
- Groups results by more than one tag. The order of the tag keys is irrelevant.GROUP BY \regex\
- Groups results by tags that match the regular expression.
If the query includes a WHERE
clause, the GROUP BY
clause must appear after the WHERE
clause.
Examples
GROUP BY time intervals
GROUP BY time()
group query results by a user-specified time interval.
When using aggregate or selector functions in the SELECT
clause, the operation is applied to each interval.
Basic GROUP BY time() syntax
Syntax
SELECT <function>(<field_key>) FROM_clause WHERE <time_range> GROUP BY time(<time_interval>),[tag_key] [fill(<fill_option>)]
Basic GROUP BY time()
queries require an InfluxQL function
in the SELECT
clause and a time range in the WHERE
clause.
Note that the GROUP BY
clause must come after the WHERE
clause.
time(time_interval)
The time_interval
in the GROUP BY time()
clause is a
duration literal.
It determines how InfluxDB groups query results over time.
For example, a time_interval
of 5m
groups query results into five-minute
time groups across the time range specified in the WHERE
clause.
fill(<fill_option>)
fill(<fill_option>)
is optional.
It changes the value reported for time intervals with no data.
See GROUP BY time intervals and fill()
for more information.
Coverage:
Basic GROUP BY time()
queries rely on the time_interval
and InfluxDB’s
preset time boundaries to determine the raw data included in each time interval
and the timestamps returned by the query.
Examples of basic syntax
The examples below use the following subsample of the sample data:
SELECT "water_level","location" FROM "h2o_feet" WHERE time >= '2019-08-18T00:00:00Z' AND time <= '2019-08-18T00:30:00Z'
Output:
time | water_level | location |
---|---|---|
2019-08-18T00:00:00Z | 8.5040000000 | coyote_creek |
2019-08-18T00:00:00Z | 2.3520000000 | santa_monica |
2019-08-18T00:06:00Z | 8.4190000000 | coyote_creek |
2019-08-18T00:06:00Z | 2.3790000000 | santa_monica |
2019-08-18T00:12:00Z | 8.3200000000 | coyote_creek |
2019-08-18T00:12:00Z | 2.3430000000 | santa_monica |
2019-08-18T00:18:00Z | 8.2250000000 | coyote_creek |
2019-08-18T00:18:00Z | 2.3290000000 | santa_monica |
2019-08-18T00:24:00Z | 8.1300000000 | coyote_creek |
2019-08-18T00:24:00Z | 2.2640000000 | santa_monica |
2019-08-18T00:30:00Z | 8.0120000000 | coyote_creek |
2019-08-18T00:30:00Z | 2.2670000000 | santa_monica |
Common issues with basic syntax
Unexpected timestamps and values in query results
With the basic syntax, InfluxDB relies on the GROUP BY time()
interval
and on the system’s preset time boundaries to determine the raw data included
in each time interval and the timestamps returned by the query.
In some cases, this can lead to unexpected results.
Example
Raw data:
SELECT "water_level" FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2019-08-18T00:00:00Z' AND time <= '2019-08-18T00:18:00Z'
Output:
time | water_level |
---|---|
2019-08-18T00:00:00Z | 8.5040000000 |
2019-08-18T00:06:00Z | 8.4190000000 |
2019-08-18T00:12:00Z | 8.3200000000 |
2019-08-18T00:18:00Z | 8.2250000000 |
Query and results:
The following example queries a 12-minute time range and groups results into 12-minute time intervals, but it returns two results:
SELECT COUNT("water_level") FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2019-08-18T00:06:00Z' AND time < '2019-08-18T00:18:00Z' GROUP BY time(12m)
Output:
time | count |
---|---|
2019-08-18T00:00:00Z | 1.0000000000 |
2019-08-18T00:12:00Z | 1.0000000000 |
Note: The timestamp in the first row of data occurs before the start of the queried time range.
Explanation:
InfluxDB uses preset round-number time boundaries for GROUP BY
intervals that are
independent of any time conditions in the WHERE
clause.
When it calculates the results, all returned data must occur within the query’s
explicit time range but the GROUP BY
intervals will be based on the preset
time boundaries.
The table below shows the preset time boundary, the relevant GROUP BY time()
interval, the
points included, and the returned timestamp for each GROUP BY time()
interval in the results.
Time Interval Number | Preset Time Boundary | GROUP BY time() Interval | Points Included | Returned Timestamp |
---|---|---|---|---|
1 | time >= 2019-08-18T00:00:00Z AND time < 2019-08-18T00:12:00Z | time >= 2019-08-18T00:06:00Z AND time < 2019-08-18T00:12:00Z | 8.005 | 2019-08-18T00:00:00Z |
2 | time >= 2019-08-12T00:12:00Z AND time < 2019-08-18T00:24:00Z | time >= 2019-08-12T00:12:00Z AND time < 2019-08-18T00:18:00Z | 7.887 | 2019-08-18T00:12:00Z |
The first preset 12-minute time boundary begins at 00:00
and ends just before
00:12
.
Only one raw point (8.005
) falls both within the query’s first GROUP BY time()
interval and in that
first time boundary.
Note that while the returned timestamp occurs before the start of the queried time range,
the query result excludes data that occur before the queried time range.
The second preset 12-minute time boundary begins at 00:12
and ends just before
00:24
.
Only one raw point (7.887
) falls both within the query’s second GROUP BY time()
interval and in that
second time boundary.
The advanced GROUP BY time()
syntax allows users to shift
the start time of the InfluxDB database’s preset time boundaries. It shifts forward the preset time boundaries by six minutes such that
InfluxDB returns:
Output:
time | count |
---|---|
2019-08-18T00:06:00Z | 2 |
Advanced GROUP BY time() syntax
Syntax
SELECT <function>(<field_key>) FROM_clause WHERE <time_range> GROUP BY time(<time_interval>,<offset_interval>),[tag_key] [fill(<fill_option>)]
Advanced GROUP BY time()
queries require an InfluxQL function
in the SELECT
clause and a time range in the
WHERE
clause). Note that the GROUP BY
clause must come after the WHERE
clause.
time(time_interval,offset_interval)
See the Basic GROUP BY time() Syntax
for details on the time_interval
.
The offset_interval
is a duration literal.
It shifts forward or back the InfluxDB database’s preset time boundaries.
The offset_interval
can be positive or negative.
fill(<fill_option>)
fill(<fill_option>)
is optional.
It changes the value reported for time intervals with no data.
See GROUP BY time intervals and fill()
for more information.
Coverage:
Advanced GROUP BY time()
queries rely on the time_interval
, the offset_interval
, and on the InfluxDB database’s preset time boundaries to determine the raw data included in each time interval
and the timestamps returned by the query.
Examples of advanced syntax
The examples below use the following subsample of the sample data:
SELECT "water_level" FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2019-08-18T00:00:00Z' AND time <= '2019-08-18T00:54:00Z'
Output:
time | water_level |
---|---|
2019-08-18T00:00:00Z | 8.5040000000 |
2019-08-18T00:06:00Z | 8.4190000000 |
2019-08-18T00:12:00Z | 8.3200000000 |
2019-08-18T00:18:00Z | 8.2250000000 |
2019-08-18T00:24:00Z | 8.1300000000 |
2019-08-18T00:30:00Z | 8.0120000000 |
2019-08-18T00:36:00Z | 7.8940000000 |
2019-08-18T00:42:00Z | 7.7720000000 |
2019-08-18T00:48:00Z | 7.6380000000 |
2019-08-18T00:54:00Z | 7.5100000000 |
GROUP BY
time intervals and fill()
fill()
changes the value reported for time intervals with no data.
Syntax
SELECT <function>(<field_key>) FROM_clause WHERE <time_range> GROUP BY time(time_interval,[<offset_interval>])[,tag_key] [fill(<fill_option>)]
By default, a GROUP BY time()
interval with no data reports null
as its
value in the output column.
fill()
changes the value reported for time intervals with no data.
Note that fill()
must go at the end of the GROUP BY
clause if you’re
GROUP(ing) BY
several things (for example, both tags and a time interval).
fill_option
- Any numerical value - Reports the given numerical value for time intervals with no data.
linear
- Reports the results of linear interpolation for time intervals with no data.none
- Reports no timestamp and no value for time intervals with no data.null
- Reports null for time intervals with no data but returns a timestamp. This is the same as the default behavior.previous
- Reports the value from the previous time interval for time intervals with no data.
Examples
Without fill(100)
:
SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m)
Output:
time | mean |
---|---|
2019-08-19T08:40:00Z | 68.0000000000 |
2019-08-19T08:45:00Z | 29.0000000000 |
2019-08-19T08:50:00Z | 47.0000000000 |
2019-08-19T08:55:00Z | |
2019-08-19T09:00:00Z | 84.0000000000 |
2019-08-19T09:05:00Z | 0.0000000000 |
2019-08-19T09:10:00Z | 41.0000000000 |
2019-08-19T09:15:00Z | 13.0000000000 |
2019-08-19T09:20:00Z | 9.0000000000 |
2019-08-19T09:25:00Z | |
2019-08-19T09:30:00Z | 6.0000000000 |
With fill(100)
:
SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) fill(100)
Output:
time | mean |
---|---|
2019-08-19T08:40:00Z | 68.0000000000 |
2019-08-19T08:45:00Z | 29.0000000000 |
2019-08-19T08:50:00Z | 47.0000000000 |
2019-08-19T08:55:00Z | 100.0000000000 |
2019-08-19T09:00:00Z | 84.0000000000 |
2019-08-19T09:05:00Z | 0.0000000000 |
2019-08-19T09:10:00Z | 41.0000000000 |
2019-08-19T09:15:00Z | 13.0000000000 |
2019-08-19T09:20:00Z | 9.0000000000 |
2019-08-19T09:25:00Z | 100.0000000000 |
2019-08-19T09:30:00Z | 6.0000000000 |
fill(100)
changes the value reported for the time interval with no data to 100
.
Without fill(linear)
:
SELECT MEAN("tadpoles") FROM "pond" WHERE time >= '2019-11-11T21:00:00Z' AND time <= '2019-11-11T22:06:00Z' GROUP BY time(12m)
Output:
time | mean |
---|---|
2019-11-11T21:00:00Z | 1 |
2019-11-11T21:12:00Z | |
2019-11-11T21:24:00Z | 3 |
2019-11-11T21:36:00Z | |
2019-11-11T21:48:00Z | |
2019-11-11T22:00:00Z | 6 |
With fill(linear)
:
SELECT MEAN("tadpoles") FROM "pond" WHERE time >= '2019-11-11T21:00:00Z' AND time <= '2019-11-11T22:06:00Z' GROUP BY time(12m) fill(linear)
Output:
time | mean |
---|---|
2019-11-11T21:00:00Z | 1 |
2019-11-11T21:12:00Z | 2 |
2019-11-11T21:24:00Z | 3 |
2019-11-11T21:36:00Z | 4 |
2019-11-11T21:48:00Z | 5 |
2019-11-11T22:00:00Z | 6 |
fill(linear)
changes the value reported for the time interval with no data
to the results of linear interpolation.
Note: The data in this example is not in the noaa
database.
Without fill(none)
:
SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m)
Output:
time | mean |
---|---|
2019-08-19T08:40:00Z | 68.0000000000 |
019-08-19T08:45:00Z | 29.0000000000 |
2019-08-19T08:50:00Z | 47.0000000000 |
2019-08-19T08:55:00Z | |
2019-08-19T09:00:00Z | 84.0000000000 |
2019-08-19T09:05:00Z | 0.0000000000 |
2019-08-19T09:10:00Z | 41.0000000000 |
2019-08-19T09:15:00Z | 13.0000000000 |
2019-08-19T09:20:00Z | 9.0000000000 |
2019-08-19T09:25:00Z | |
2019-08-19T09:30:00Z | 6.0000000000 |
With fill(none)
:
SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) fill(none)
Output:
time | mean |
---|---|
2019-08-19T08:40:00Z | 68.0000000000 |
2019-08-19T08:45:00Z | 29.0000000000 |
2019-08-19T08:50:00Z | 47.0000000000 |
2019-08-19T09:00:00Z | 84.0000000000 |
2019-08-19T09:05:00Z | 0.0000000000 |
2019-08-19T09:10:00Z | 41.0000000000 |
2019-08-19T09:15:00Z | 13.0000000000 |
2019-08-19T09:20:00Z | 9.0000000000 |
2019-08-19T09:30:00Z | 6.0000000000 |
`fill(none)` reports no value and no timestamp for the time interval with no data.
Without fill(null)
:
SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m)
Output:
time | mean |
---|---|
2019-08-19T08:40:00Z | 68.0000000000 |
019-08-19T08:45:00Z | 29.0000000000 |
2019-08-19T08:50:00Z | 47.0000000000 |
2019-08-19T08:55:00Z | |
2019-08-19T09:00:00Z | 84.0000000000 |
2019-08-19T09:05:00Z | 0.0000000000 |
2019-08-19T09:10:00Z | 41.0000000000 |
2019-08-19T09:15:00Z | 13.0000000000 |
2019-08-19T09:20:00Z | 9.0000000000 |
2019-08-19T09:25:00Z | |
2019-08-19T09:30:00Z | 6.0000000000 |
With fill(null)
:
SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) fill(null)
Output:
time | mean |
---|---|
2019-08-19T08:40:00Z | 68.0000000000 |
019-08-19T08:45:00Z | 29.0000000000 |
2019-08-19T08:50:00Z | 47.0000000000 |
2019-08-19T08:55:00Z | null |
2019-08-19T09:00:00Z | 84.0000000000 |
2019-08-19T09:05:00Z | 0.0000000000 |
2019-08-19T09:10:00Z | 41.0000000000 |
2019-08-19T09:15:00Z | 13.0000000000 |
2019-08-19T09:20:00Z | 9.0000000000 |
2019-08-19T09:25:00Z | null |
2019-08-19T09:30:00Z | 6.0000000000 |
fill(null)
reports null
as the value for the time interval with no data.
That result matches the result of the query without fill(null)
.
Without fill(previous)
:
SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m)
Output:
time | mean |
---|---|
2019-08-19T08:40:00Z | 68.0000000000 |
019-08-19T08:45:00Z | 29.0000000000 |
2019-08-19T08:50:00Z | 47.0000000000 |
2019-08-19T08:55:00Z | |
2019-08-19T09:00:00Z | 84.0000000000 |
2019-08-19T09:05:00Z | 0.0000000000 |
2019-08-19T09:10:00Z | 41.0000000000 |
2019-08-19T09:15:00Z | 13.0000000000 |
2019-08-19T09:20:00Z | 9.0000000000 |
2019-08-19T09:25:00Z |
With fill(previous)
:
SELECT MEAN("index") FROM "h2o_quality" WHERE "location"='santa_monica' AND time >= '2019-08-19T08:42:00Z' AND time <= '2019-08-19T09:30:00Z' GROUP BY time(5m) fill(previous)
Output:
time | mean |
---|---|
2019-08-19T08:40:00Z | 68.0000000000 |
019-08-19T08:45:00Z | 29.0000000000 |
2019-08-19T08:50:00Z | 47.0000000000 |
2019-08-19T08:55:00Z | 47.0000000000 |
2019-08-19T09:00:00Z | 84.0000000000 |
2019-08-19T09:05:00Z | 0.0000000000 |
2019-08-19T09:10:00Z | 41.0000000000 |
2019-08-19T09:15:00Z | 13.0000000000 |
2019-08-19T09:20:00Z | 9.0000000000 |
2019-08-19T09:25:00Z | 9.0000000000 |
fill(previous)
changes the value reported for the time interval with no data to 3.235
,
the value from the previous time interval.
Common issues with fill()
Queries with no data in the queried time range
Currently, queries ignore fill()
if no data exists in the queried time range.
This is the expected behavior. An open
feature request on GitHub
proposes that fill()
should force a return of values even if the queried time
range covers no data.
Example
The following query returns no data because water_level
has no points within
the queried time range.
Note that fill(800)
has no effect on the query results.
SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location" = 'coyote_creek' AND time >= '2019-09-18T22:00:00Z' AND time <= '2019-09-18T22:18:00Z' GROUP BY time(12m) fill(800)
> no results
Queries with fill(previous)
when the previous result is outside the queried time range
fill(previous)
doesn’t fill the result for a time interval if the previous
value is outside the query’s time range.
Example
The following example queries the time range between 2019-09-18T16:24:00Z
and 2019-09-18T16:54:00Z
.
Note that fill(previous)
fills the result for 2019-09-18T16:36:00Z
with the
result from 2019-09-18T16:24:00Z
.
SELECT MAX("water_level") FROM "h2o_feet" WHERE location = 'coyote_creek' AND time >= '2019-09-18T16:24:00Z' AND time <= '2019-09-18T16:54:00Z' GROUP BY time(12m) fill(previous)
Output:
time | max |
---|---|
2019-09-18T16:24:00Z | 3.235 |
2019-09-18T16:36:00Z | 3.235 |
2019-09-18T16:48:00Z | 4 |
The next example queries the time range between 2019-09-18T16:36:00Z
and 2019-09-18T16:54:00Z
.
Note that fill(previous)
doesn’t fill the result for 2019-09-18T16:36:00Z
with the
result from 2019-09-18T16:24:00Z
; the result for 2019-09-18T16:24:00Z
is outside the query’s
shorter time range.
SELECT MAX("water_level") FROM "h2o_feet" WHERE location = 'coyote_creek' AND time >= '2019-09-18T16:36:00Z' AND time <= '2019-09-18T16:54:00Z' GROUP BY time(12m) fill(previous)
Output:
time | max |
---|---|
2019-09-18T16:36:00Z | |
2019-09-18T16:48:00Z | 4 |
fill(linear)
when the previous or following result is outside the queried time range
fill(linear)
doesn’t fill the result for a time interval with no data if the
previous result or the following result is outside the queried time range.
Example
The following example queries the time range between 2019-11-11T21:24:00Z
and
2019-11-11T22:06:00Z
. Note that fill(linear)
fills the results for the
2019-11-11T21:36:00Z
time interval and the 2019-11-11T21:48:00Z
time interval
using the values from the 2019-11-11T21:24:00Z
time interval and the
2019-11-11T22:00:00Z
time interval.
SELECT MEAN("tadpoles") FROM "pond" WHERE time > '2019-11-11T21:24:00Z' AND time <= '2019-11-11T22:06:00Z' GROUP BY time(12m) fill(linear)
Output:
time | mean |
---|---|
2019-11-11T21:24:00Z | 3 |
2019-11-11T21:36:00Z | 4 |
2019-11-11T21:48:00Z | 5 |
2019-11-11T22:00:00Z | 6 |
The next query shortens the time range in the previous query.
It now covers the time between 2019-11-11T21:36:00Z
and 2019-11-11T22:06:00Z
.
Note that fill()
previous doesn’t fill the results for the 2019-11-11T21:36:00Z
time interval and the 2019-11-11T21:48:00Z
time interval; the result for
2019-11-11T21:24:00Z
is outside the query’s shorter time range and InfluxDB
cannot perform the linear interpolation.
SELECT MEAN("tadpoles") FROM "pond" WHERE time >= '2019-11-11T21:36:00Z' AND time <= '2019-11-11T22:06:00Z' GROUP BY time(12m) fill(linear)
Output:
time | mean |
---|---|
2019-11-11T21:36:00Z | |
2019-11-11T21:48:00Z | |
2019-11-11T22:00:00Z | 6 |
Note: The data in Issue 3 are not in NOAA
database. We had to create a dataset with less regular data to work with fill(linear)
.
Was this page helpful?
Thank you for your feedback!
Support and feedback
Thank you for being part of our community! We welcome and encourage your feedback and bug reports for InfluxDB and this documentation. To find support, use the following resources:
Customers with an annual or support contract can contact InfluxData Support.