Documentation

InfluxDB key concepts

This page documents an earlier version of InfluxDB. InfluxDB v2 is the latest stable version. See the equivalent InfluxDB v2 documentation: InfluxDB key concepts.

Before diving into InfluxDB, it’s good to get acquainted with some key concepts of the database. This document introduces key InfluxDB concepts and elements. To introduce the key concepts, we’ll cover how the following elements work together in InfluxDB:

Sample data

The next section references the data printed out below. The data is fictional, but represents a believable setup in InfluxDB. They show the number of butterflies and honeybees counted by two scientists (langstroth and perpetua) in two locations (location 1 and location 2) over the time period from August 18, 2015 at midnight through August 18, 2015 at 6:12 AM. Assume that the data lives in a database called my_database and are subject to the autogen retention policy (more on databases and retention policies to come).

Hint: Hover over the links for tooltips to get acquainted with InfluxDB terminology and the layout.

name: census

timebutterflieshoneybeeslocationscientist
2015-08-18T00:00:00Z12231langstroth
2015-08-18T00:00:00Z1301perpetua
2015-08-18T00:06:00Z11281langstroth
2015-08-18T00:06:00Z3281perpetua
2015-08-18T05:54:00Z2112langstroth
2015-08-18T06:00:00Z1102langstroth
2015-08-18T06:06:00Z8232perpetua
2015-08-18T06:12:00Z7222perpetua

Discussion

Now that you’ve seen some sample data in InfluxDB this section covers what it all means.

InfluxDB is a time series database so it makes sense to start with what is at the root of everything we do: time. In the data above there’s a column called time - all data in InfluxDB have that column. time stores timestamps, and the timestamp shows the date and time, in RFC3339 UTC, associated with particular data.

The next two columns, called butterflies and honeybees, are fields. Fields are made up of field keys and field values. Field keys (butterflies and honeybees) are strings; the field key butterflies tells us that the field values 12-7 refer to butterflies and the field key honeybees tells us that the field values 23-22 refer to, well, honeybees.

Field values are your data; they can be strings, floats, integers, or Booleans, and, because InfluxDB is a time series database, a field value is always associated with a timestamp. The field values in the sample data are:

12   23
1    30
11   28
3    28
2    11
1    10
8    23
7    22

In the data above, the collection of field-key and field-value pairs make up a field set. Here are all eight field sets in the sample data:

  • butterflies = 12 honeybees = 23
  • butterflies = 1 honeybees = 30
  • butterflies = 11 honeybees = 28
  • butterflies = 3 honeybees = 28
  • butterflies = 2 honeybees = 11
  • butterflies = 1 honeybees = 10
  • butterflies = 8 honeybees = 23
  • butterflies = 7 honeybees = 22

Fields are a required piece of the InfluxDB data structure - you cannot have data in InfluxDB without fields. It’s also important to note that fields are not indexed. Queries that use field values as filters must scan all values that match the other conditions in the query. As a result, those queries are not performant relative to queries on tags (more on tags below). In general, fields should not contain commonly queried metadata.

The last two columns in the sample data, called location and scientist, are tags. Tags are made up of tag keys and tag values. Both tag keys and tag values are stored as strings and record metadata. The tag keys in the sample data are location and scientist. The tag key location has two tag values: 1 and 2. The tag key scientist also has two tag values: langstroth and perpetua.

In the data above, the tag set is the different combinations of all the tag key-value pairs. The four tag sets in the sample data are:

  • location = 1, scientist = langstroth
  • location = 2, scientist = langstroth
  • location = 1, scientist = perpetua
  • location = 2, scientist = perpetua

Tags are optional. You don’t need to have tags in your data structure, but it’s generally a good idea to make use of them because, unlike fields, tags are indexed. This means that queries on tags are faster and that tags are ideal for storing commonly queried metadata.

Avoid using the following reserved keys:

  • _field
  • _measurement
  • time

If reserved keys are included as a tag or field key, the associated point is discarded.

Why indexing matters: The schema case study

Say you notice that most of your queries focus on the values of the field keys honeybees and butterflies:

SELECT * FROM "census" WHERE "butterflies" = 1
SELECT * FROM "census" WHERE "honeybees" = 23

Because fields aren’t indexed, InfluxDB scans every value of butterflies in the first query and every value of honeybees in the second query before it provides a response. That behavior can hurt query response times - especially on a much larger scale. To optimize your queries, it may be beneficial to rearrange your schema such that the fields (butterflies and honeybees) become the tags and the tags (location and scientist) become the fields:

name: census

timelocationscientistbutterflieshoneybees
2015-08-18T00:00:00Z1langstroth1223
2015-08-18T00:00:00Z1perpetua130
2015-08-18T00:06:00Z1langstroth1128
2015-08-18T00:06:00Z1perpetua328
2015-08-18T05:54:00Z2langstroth211
2015-08-18T06:00:00Z2langstroth110
2015-08-18T06:06:00Z2perpetua823
2015-08-18T06:12:00Z2perpetua722

Now that butterflies and honeybees are tags, InfluxDB won’t have to scan every one of their values when it performs the queries above - this means that your queries are even faster.

The measurement acts as a container for tags, fields, and the time column, and the measurement name is the description of the data that are stored in the associated fields. Measurement names are strings, and, for any SQL users out there, a measurement is conceptually similar to a table. The only measurement in the sample data is census. The name census tells us that the field values record the number of butterflies and honeybees - not their size, direction, or some sort of happiness index.

A single measurement can belong to different retention policies. A retention policy describes how long InfluxDB keeps data (DURATION) and how many copies of this data is stored in the cluster (REPLICATION). If you’re interested in reading more about retention policies, check out Database Management.

Replication factors do not serve a purpose with single node instances.

In the sample data, everything in the census measurement belongs to the autogen retention policy. InfluxDB automatically creates that retention policy; it has an infinite duration and a replication factor set to one.

Now that you’re familiar with measurements, tag sets, and retention policies, let’s discuss series. In InfluxDB, a series is a collection of points that share a measurement, tag set, and field key. The data above consist of eight series:

Series numberMeasurementTag setField key
series 1censuslocation = 1,scientist = langstrothbutterflies
series 2censuslocation = 2,scientist = langstrothbutterflies
series 3censuslocation = 1,scientist = perpetuabutterflies
series 4censuslocation = 2,scientist = perpetuabutterflies
series 5censuslocation = 1,scientist = langstrothhoneybees
series 6censuslocation = 2,scientist = langstrothhoneybees
series 7censuslocation = 1,scientist = perpetuahoneybees
series 8censuslocation = 2,scientist = perpetuahoneybees

Understanding the concept of a series is essential when designing your schema and when working with your data in InfluxDB.

A point represents a single data record that has four components: a measurement, tag set, field set, and a timestamp. A point is uniquely identified by its series and timestamp.

For example, here’s a single point:

name: census
-----------------
time                    butterflies honeybees   location    scientist
2015-08-18T00:00:00Z    1           30          1           perpetua

The point in this example is part of series 3 and 7 and defined by the measurement (census), the tag set (location = 1, scientist = perpetua), the field set (butterflies = 1, honeybees = 30), and the timestamp 2015-08-18T00:00:00Z.

All of the stuff we’ve just covered is stored in a database - the sample data are in the database my_database. An InfluxDB database is similar to traditional relational databases and serves as a logical container for users, retention policies, continuous queries, and, of course, your time series data. See Authentication and Authorization and Continuous Queries for more on those topics.

Databases can have several users, continuous queries, retention policies, and measurements. InfluxDB is a schemaless database which means it’s easy to add new measurements, tags, and fields at any time. It’s designed to make working with time series data awesome.

You made it! You’ve covered the fundamental concepts and terminology in InfluxDB. If you’re just starting out, we recommend taking a look at Getting Started and the Writing Data and Querying Data guides. May our time series database serve you well 🕔.


Was this page helpful?

Thank you for your feedback!


The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Read more

InfluxDB v3 enhancements and InfluxDB Clustered is now generally available

New capabilities, including faster query performance and management tooling advance the InfluxDB v3 product line. InfluxDB Clustered is now generally available.

InfluxDB v3 performance and features

The InfluxDB v3 product line has seen significant enhancements in query performance and has made new management tooling available. These enhancements include an operational dashboard to monitor the health of your InfluxDB cluster, single sign-on (SSO) support in InfluxDB Cloud Dedicated, and new management APIs for tokens and databases.

Learn about the new v3 enhancements


InfluxDB Clustered general availability

InfluxDB Clustered is now generally available and gives you the power of InfluxDB v3 in your self-managed stack.

Talk to us about InfluxDB Clustered