Documentation

Handle duplicate data points

InfluxDB identifies unique data points by their measurement, tag set, and timestamp (each a part of Line protocol used to write data to InfluxDB).

web,host=host2,region=us_west firstByte=15.0 1559260800000000000
--- -------------------------                -------------------
 |               |                                    |
Measurement   Tag set                             Timestamp

Duplicate data points

For points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point. For example:

# Existing data point
web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000

# New data point
web,host=host2,region=us_west firstByte=15.0 1559260800000000000

After you submit the new data point, InfluxDB overwrites firstByte with the new field value and leaves the field dnsLookup alone:

# Resulting data point
web,host=host2,region=us_west firstByte=15.0,dnsLookup=7.0 1559260800000000000
from(bucket: "example-bucket")
  |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  |> filter(fn: (r) => r._measurement == "web")

Table: keys: [_measurement, host, region]
               _time  _measurement   host   region  dnsLookup  firstByte
--------------------  ------------  -----  -------  ---------  ---------
2019-05-31T00:00:00Z           web  host2  us_west          7         15

Preserve duplicate points

To preserve both old and new field values in duplicate points, use one of the following strategies:

Add an arbitrary tag

Add an arbitrary tag with unique values so InfluxDB reads the duplicate points as unique.

For example, add a uniq tag to each data point:

# Existing point
web,host=host2,region=us_west,uniq=1 firstByte=24.0,dnsLookup=7.0 1559260800000000000

# New point
web,host=host2,region=us_west,uniq=2 firstByte=15.0 1559260800000000000

It is not necessary to retroactively add the unique tag to the existing data point. Tag sets are evaluated as a whole. The arbitrary uniq tag on the new point allows InfluxDB to recognize it as a unique point. However, this causes the schema of the two points to differ and may lead to challenges when querying the data.

After writing the new point to InfluxDB:

from(bucket: "example-bucket")
  |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  |> filter(fn: (r) => r._measurement == "web")

Table: keys: [_measurement, host, region, uniq]
               _time  _measurement   host   region  uniq  firstByte  dnsLookup
--------------------  ------------  -----  -------  ----  ---------  ---------
2019-05-31T00:00:00Z           web  host2  us_west     1         24          7

Table: keys: [_measurement, host, region, uniq]
               _time  _measurement   host   region  uniq  firstByte
--------------------  ------------  -----  -------  ----  ---------
2019-05-31T00:00:00Z           web  host2  us_west     2         15

Increment the timestamp

Increment the timestamp by a nanosecond to enforce the uniqueness of each point.

# Old data point
web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000

# New data point
web,host=host2,region=us_west firstByte=15.0 1559260800000000001

After writing the new point to InfluxDB:

from(bucket: "example-bucket")
  |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  |> filter(fn: (r) => r._measurement == "web")

Table: keys: [_measurement, host, region]
                         _time  _measurement   host   region  firstByte  dnsLookup
------------------------------  ------------  -----  -------  ---------  ---------
2019-05-31T00:00:00.000000000Z           web  host2  us_west         24          7
2019-05-31T00:00:00.000000001Z           web  host2  us_west         15

The output of examples queries in this article has been modified to clearly show the different approaches and results for handling duplicate data.


Was this page helpful?

Thank you for your feedback!


The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Read more

InfluxDB v3 enhancements and InfluxDB Clustered is now generally available

New capabilities, including faster query performance and management tooling advance the InfluxDB v3 product line. InfluxDB Clustered is now generally available.

InfluxDB v3 performance and features

The InfluxDB v3 product line has seen significant enhancements in query performance and has made new management tooling available. These enhancements include an operational dashboard to monitor the health of your InfluxDB cluster, single sign-on (SSO) support in InfluxDB Cloud Dedicated, and new management APIs for tokens and databases.

Learn about the new v3 enhancements


InfluxDB Clustered general availability

InfluxDB Clustered is now generally available and gives you the power of InfluxDB v3 in your self-managed stack.

Talk to us about InfluxDB Clustered

InfluxDB Cloud powered by TSM