InfluxDB schema design
Each InfluxDB use case is unique and your schema design reflects the uniqueness. We recommend the following design guidelines for most use cases:
Follow these guidelines to minimize high series cardinality and make your data more performant.
Where to store data (tag or field)
Tags are indexed and fields are not. This means that querying by tags is more performant than querying by fields.
In general, your queries should guide what gets stored as a tag and what gets stored as a field:
- Store commonly-queried meta data in tags.
- Store data in fields if each data point contains a different value.
- Store numeric values as fields (tag values only support string values).
Avoid too many series
Tags containing highly variable information like unique IDs, hashes, and random strings lead to a large number of series, also known as high series cardinality.
High series cardinality is a primary driver of high memory usage for many database workloads. InfluxDB uses measurements and tags to create indexes and speed up reads. However, when too many indexes created, both writes and reads may start to slow down. Therefore, if a system has memory constraints, consider storing high-cardinality data as a field rather than a tag.
If reads and writes to InfluxDB start to slow down, you may have high series cardinality (too many series). See how to resolve high cardinality.
Use recommended naming conventions
Use the following conventions when naming your tag and field keys:
- Avoid keywords in tag and field names
- Avoid the same tag and field name
- Avoid encoding data in measurement names
- Avoid more than one piece of information in one tag
Avoid keywords as tag or field names
Not required, but simplifies writing queries because you won’t have to wrap tag or field names in double quotes. See Flux keywords to avoid.
Also, if a tag or field name contains non-alphanumeric characters, you must use bracket notation in Flux.
Avoid the same name for a tag and a field
Avoid using the same name for a tag and field key, which may result in unexpected behavior when querying data.
Avoid encoding data in measurement names
InfluxDB queries merge data that falls within the same measurement, so it’s better to differentiate data with tags than with detailed measurement names. If you encode data in a measurement name, you must use a regular expression to query the data, making some queries more complicated.
Example line protocol schemas
Consider the following schema represented by line protocol.
Schema 1 - Data encoded in the measurement name
-------------
blueberries.plot-1.north temp=50.1 1472515200000000000
blueberries.plot-2.midwest temp=49.8 1472515200000000000
The long measurement names (blueberries.plot-1.north
) with no tags are similar to Graphite metrics.
Encoding the plot
and region
in the measurement name makes the data more difficult to query.
For example, calculating the average temperature of both plots 1 and 2 is not possible with schema 1. Compare this to schema 2:
Schema 2 - Data encoded in tags
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
Flux example to query schemas
Use Flux to calculate the average temp
for blueberries in the north
region:
// Schema 1 - Query for data encoded in the measurement name
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement =~ /\.north$/ and r._field == "temp")
|> mean()
// Schema 2 - Query for data encoded in tags
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
|> mean()
In schema 1, we see that querying the plot
and region
in the measurement name makes the data more difficult to query.
Avoid putting more than one piece of information in one tag
Splitting a single tag with multiple pieces into separate tags simplifies your queries and reduces the need for regular expressions.
Example line protocol schemas
Consider the following schema represented by line protocol.
Schema 1 - Multiple data encoded in a single tag
-------------
weather_sensor,crop=blueberries,location=plot-1.north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,location=plot-2.midwest temp=49.8 1472515200000000000
The schema 1 data encodes multiple parameters, the plot
and region
, into a long tag value (plot-1.north
).
Compare this to schema 2:
Schema 2 - Data encoded in multiple tags
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
Schema 2 is preferable because, with multiple tags, you don’t need a regular expression.
Flux example to query schemas
The following Flux examples show how to calculate the average temp
for blueberries in the north
region; both for schema 1 and schema 2.
// Schema 1 - Query for multiple data encoded in a single tag
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.location =~ /\.north$/ and r._field == "temp")
|> mean()
// Schema 2 - Query for data encoded in multiple tags
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
|> mean()
In schema 1, we see that querying the plot
and region
in a single tag makes the data more difficult to query.
Support and feedback
Thank you for being part of our community! We welcome and encourage your feedback and bug reports for InfluxDB and this documentation. To find support, the following resources are available:
InfluxDB Cloud and InfluxDB Enterprise customers can contact InfluxData Support.