Aggregate data with SQL
InfluxDB 3 Enterprise is in Public Alpha
InfluxDB 3 Enterprise is in public alpha and available for testing and feedback, but is not meant for production use. Both the product and this documentation are works in progress. We welcome and encourage your input about your experience with the alpha and invite you to join our public channels for updates and to share feedback.
An SQL query that aggregates data includes the following clauses:
* Required- *
SELECT
: Specify fields, tags, and calculations to output from a table or use the wildcard alias (*
) to select all fields and tags from a table. - *
FROM
: Specify the table to query data from. WHERE
: Only return rows that meets the specified conditions–for example, the time is within a time range, a tag has a specific value, or a field value is above or below a specified threshold.GROUP BY
: Group data that have the same values for specified columns and expressions (for example, an aggregate function result).
For simplicity, the verb, “aggregate,” in this guide refers to applying both aggregate and selector functions to a dataset.
Learn how to apply aggregate operations to your queried data:
Aggregate and selector functions
Both aggregate and selector functions return a single row from each SQL group.
For example, if you GROUP BY room
and perform an aggregate operation
in your SELECT
clause, results include an aggregate value for each unique
value of room
.
Aggregate functions
Use aggregate functions to aggregate values in a specified column for each group and return a single row per group containing the aggregate value.
Basic aggregate query
SELECT AVG(co) from home
Selector functions
Use selector functions to “select” a value from a specified column. The available selector functions are designed to work with time series data.
Each selector function returns a Rust struct (similar to a JSON object)
representing a single time and value from the specified column in the each group.
What time and value get returned depend on the logic in the selector function.
For example, selector_first
returns the value of specified column in the first
row of the group. selector_max
returns the maximum value of the specified
column in the group.
Selector struct schema
The struct returned from a selector function has two properties:
- time:
time
value in the selected row - value: value of the specified column in the selected row
{time: 2023-01-01T00:00:00Z, value: 72.1}
Use selector functions
Each selector function has two arguments:
- The first is the column to operate on.
- The second is the time column to use in the selection logic.
In your SELECT
statement, execute a selector function and use bracket notation
to reference properties of the returned struct to
populate the column value:
SELECT
selector_first(temp, time)['time'] AS time,
selector_first(temp, time)['value'] AS temp,
room
FROM home
GROUP BY room
Example aggregate queries
Sample data
The following examples use the Home sensor sample data. To run the example queries and return results, write the sample data to your InfluxDB 3 Enterprise database before running the example queries.
Perform an ungrouped aggregation
To aggregate all queried values in a specified column:
- Use aggregate or selector functions in your
SELECT
statement. - Do not include a
GROUP BY
clause to leave your data ungrouped.
SELECT avg(co) AS 'average co' from home
Group and aggregate data
To apply aggregate or selector functions to grouped data:
- Use aggregate or selector functions in your
SELECT
statement. - Include columns to group by in your
SELECT
statement. - Include a
GROUP BY
clause with a comma-delimited list of columns and expressions to group by.
Keep the following in mind when using GROUP BY
:
GROUP BY
can use column aliases that are defined in theSELECT
clause.GROUP BY
won’t use an aliased value if the alias is the same as the original column name.GROUP BY
will use the original value of the column, not the transformed, aliased value.
SELECT
room,
avg(temp) AS 'average temp'
FROM home
GROUP BY room
Downsample data by applying interval-based aggregates
A common use case when querying time series is downsampling data by applying aggregates to time-based groups. To group and aggregate data into time-based groups:
In your
SELECT
clause:Use the
DATE_BIN
function to calculate time intervals and output a column that contains the start of the interval nearest to thetime
timestamp in each row–for example, the following clause calculates two-hour intervals (originating at the Unix epoch) and returns a newtime
column that contains the start of the interval nearest tohome.time
:SELECT DATE_BIN(INTERVAL '2 hours', time) AS time FROM home ...
Given a
time
value , the outputtime
column contains .
In your
GROUP BY
clause:- Specify the
DATE_BIN(...)
column ordinal reference (1
). This lets you group by the transformedtime
value and maintain thetime
column name. - Specify other columns (for example,
room
) that are specified in theSELECT
clause and aren’t used in a selector function.
SELECT DATE_BIN(INTERVAL '2 hours', time) AS time ... GROUP BY 1, room ...
To reference the
DATE_BIN(...)
result column by name in theGROUP BY
clause, assign an alias other than “time” in theSELECT
clause–for example:SELECT DATE_BIN(INTERVAL '2 hours', time) AS _time FROM home ... GROUP BY _time, room
- Specify the
Include an
ORDER BY
clause with columns to sort by.
The following example retrieves unique combinations of time intervals and rooms with their minimum, maximum, and average temperatures:
SELECT
DATE_BIN(INTERVAL '2 hours', time) AS time,
room,
selector_max(temp, time)['value'] AS 'max temp',
selector_min(temp, time)['value'] AS 'min temp',
avg(temp) AS 'average temp'
FROM home
GROUP BY 1, room
ORDER BY room, 1
GROUP BY time
In the GROUP BY
clause, the name “time” always refers to the time
column
in the source table. If you want to reference a calculated time column by name,
use an alias different from “time” or use the column ordinal–for example:
SELECT
DATE_BIN(INTERVAL '2 hours', time) AS _time,
room,
selector_max(temp, time)['value'] AS 'max temp',
selector_min(temp, time)['value'] AS 'min temp',
avg(temp) AS 'average temp'
FROM home
GROUP BY _time, room
ORDER BY room, _time
SELECT
DATE_BIN(INTERVAL '2 hours', time) AS time,
room,
selector_max(temp, time)['value'] AS 'max temp',
selector_min(temp, time)['value'] AS 'min temp',
avg(temp) AS 'average temp'
FROM home
GROUP BY 1, room
ORDER BY room, 1
Query rows based on aggregate values
To query data based on values after an aggregate operation, include a HAVING
clause with defined predicate conditions such as a value threshold.
Predicates in the WHERE
clause are applied before data is aggregated.
Predicates in the HAVING
clause are applied after data is aggregated.
SELECT
room,
avg(co) AS 'average co'
FROM home
GROUP BY room
HAVING "average co" > 5
Was this page helpful?
Thank you for your feedback!
Support and feedback
Thank you for being part of our community! We welcome and encourage your feedback and bug reports for InfluxDB 3 Enterprise and this documentation. To find support, use the following resources:
Customers with an annual or support contract can contact InfluxData Support.