Documentation

Handle duplicate data points

InfluxDB identifies unique data points by their measurement, tag set, and timestamp (each a part of Line protocol used to write data to InfluxDB).

web,host=host2,region=us_west firstByte=15.0 1559260800000000000
--- -------------------------                -------------------
 |               |                                    |
Measurement   Tag set                             Timestamp

Duplicate data points

For points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point. For example:

# Existing data point
web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000

# New data point
web,host=host2,region=us_west firstByte=15.0 1559260800000000000

After you submit the new data point, InfluxDB overwrites firstByte with the new field value and leaves the field dnsLookup alone:

# Resulting data point
web,host=host2,region=us_west firstByte=15.0,dnsLookup=7.0 1559260800000000000
from(bucket: "example-bucket")
  |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  |> filter(fn: (r) => r._measurement == "web")

Table: keys: [_measurement, host, region]
               _time  _measurement   host   region  dnsLookup  firstByte
--------------------  ------------  -----  -------  ---------  ---------
2019-05-31T00:00:00Z           web  host2  us_west          7         15

Preserve duplicate points

To preserve both old and new field values in duplicate points, use one of the following strategies:

Add an arbitrary tag

Add an arbitrary tag with unique values so InfluxDB reads the duplicate points as unique.

For example, add a uniq tag to each data point:

# Existing point
web,host=host2,region=us_west,uniq=1 firstByte=24.0,dnsLookup=7.0 1559260800000000000

# New point
web,host=host2,region=us_west,uniq=2 firstByte=15.0 1559260800000000000

It is not necessary to retroactively add the unique tag to the existing data point. Tag sets are evaluated as a whole. The arbitrary uniq tag on the new point allows InfluxDB to recognize it as a unique point. However, this causes the schema of the two points to differ and may lead to challenges when querying the data.

After writing the new point to InfluxDB:

from(bucket: "example-bucket")
  |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  |> filter(fn: (r) => r._measurement == "web")

Table: keys: [_measurement, host, region, uniq]
               _time  _measurement   host   region  uniq  firstByte  dnsLookup
--------------------  ------------  -----  -------  ----  ---------  ---------
2019-05-31T00:00:00Z           web  host2  us_west     1         24          7

Table: keys: [_measurement, host, region, uniq]
               _time  _measurement   host   region  uniq  firstByte
--------------------  ------------  -----  -------  ----  ---------
2019-05-31T00:00:00Z           web  host2  us_west     2         15

Increment the timestamp

Increment the timestamp by a nanosecond to enforce the uniqueness of each point.

# Old data point
web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000

# New data point
web,host=host2,region=us_west firstByte=15.0 1559260800000000001

After writing the new point to InfluxDB:

from(bucket: "example-bucket")
  |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  |> filter(fn: (r) => r._measurement == "web")

Table: keys: [_measurement, host, region]
                         _time  _measurement   host   region  firstByte  dnsLookup
------------------------------  ------------  -----  -------  ---------  ---------
2019-05-31T00:00:00.000000000Z           web  host2  us_west         24          7
2019-05-31T00:00:00.000000001Z           web  host2  us_west         15

The output of examples queries in this article has been modified to clearly show the different approaches and results for handling duplicate data.


Was this page helpful?

Thank you for your feedback!


The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Read more

InfluxDB 3 Open Source Now in Public Alpha

InfluxDB 3 Open Source is now available for alpha testing, licensed under MIT or Apache 2 licensing.

We are releasing two products as part of the alpha.

InfluxDB 3 Core, is our new open source product. It is a recent-data engine for time series and event data. InfluxDB 3 Enterprise is a commercial version that builds on Core’s foundation, adding historical query capability, read replicas, high availability, scalability, and fine-grained security.

For more information on how to get started, check out:

InfluxDB Cloud powered by TSM