Documentation

InfluxDB Cloud data durability

InfluxDB Cloud replicates all data in the storage tier across two availability zones in a cloud region, automatically creates backups, and verifies that replicated data is consistent and readable.

On this page

Data replication

InfluxDB Cloud replicates data in both the write tier and the storage tier.

  • Write tier: all data written to InfluxDB is processed by a durable message queue. The message queue partitions each batch of points based off series keys and then replicates each partition across other physical nodes in the message queue.
  • Storage tier: all data in the underlying storage tier is replicated across two availability zones in a cloud region.

Backup processes

InfluxDB Cloud backs up all data in the following way:

Backup on write

All inbound write requests to InfluxDB Cloud are added to a durable message queue. The message queue does the following:

  1. Caches the line protocol of each write request.
  2. Writes data to the storage tier.
  3. Routinely persists cached line protocol to object storage as an out-of-band backup.

Message queue backups provide raw line protocol that can be used to recover from catastrophic failure in the storage tier or an accidental deletion. The durability of the message queue is 96 hours, meaning InfluxDB Cloud can sustain a failure of its underlying storage tier or object storage services for up to 96 hours without any data loss.

To minimize potential data loss due to defects introduced in the InfluxDB Cloud service, we minimize the code used between the data ingest and backup processes.

Backup after compaction

The InfluxDB storage engine compresses data over time in a process known as compaction. When each compaction cycle completes, InfluxDB Cloud stores compressed TSM files in object storage.

Periodic TSM snapshots

To provide multiple data recovery points, InfluxDB Cloud takes weekly snapshots of TSM files uploaded to object storage. The TSM snapshot includes a copy of all (non-deleted) data when the snapshot is created. These snapshots are preserved for 100 days.

Recovery

InfluxDB Cloud uses the following out-of-band backups stored in object storage to recover data:

  • Message queue backup: line protocol from inbound write requests within the last 96 hours
  • Compaction backup: TSM files
  • TSM snapshots: Weekly snapshots of TSM files in objectstore

The Recovery Point Objective (RPO) is any accepted write. The Recovery Time Objective (RTO) is harder to definitively predict as potential failure modes can vary. While most common failure modes can be resolved within minutes or hours, critical failure modes may take longer. For example, if we need to rebuild all data from the TSM snapshots and message queue backup, it could take 24 hours or longer.

Data verification

InfluxDB Cloud has two data verification services running at all times:

  • Entropy detection: ensures that replicated data is consistent
  • Data verification: verifies that data written to InfluxDB is readable

InfluxDB Cloud status

InfluxDB Cloud regions and underlying services are monitored at all times. For information about the current status of InfluxDB Cloud, see the InfluxDB Cloud status page.


Was this page helpful?

Thank you for your feedback!


The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Read more

InfluxDB Cloud powered by TSM