Documentation

InfluxDB Clustered data durability

How data flows through InfluxDB Clustered

When data is written to InfluxDB Clustered, it progresses through multiple stages to ensure durability, optimized performance and storage, and efficient querying. Configuration options at each stage affect system behavior, balancing reliability and resource usage.

IngesterRouterQuerierObject StorageTime series data stored inApache Parquet formatCatalogRelational metadataserviceCompactorQuery yet-to-be-persisted dataWALShort-termpersistenceWrite requestsQuery requestsGarbage Collector

Figure: Write request, response, and ingest flow for InfluxDB Clustered

Data ingest

  1. Write validation and memory buffer
  2. Write-ahead log (WAL) persistence

Write validation and memory buffer

The Router validates incoming data to prevent malformed or unsupported data from entering the system. InfluxDB Clustered writes accepted data to multiple write-ahead log (WAL) files on Ingester pods’ local storage (default is 2 for redundancy) before acknowledging the write request. The Ingester holds the data in memory to ensure leading-edge data is available for querying.

Write-ahead log (WAL) persistence

Ingesters persist the contents of the WAL to Parquet files in object storage and updates the Catalog to reference the newly created Parquet files. InfluxDB Clustered retains WALs until the data is persisted.

If an Ingester node is gracefully shut down (for example, during a new software deployment), it flushes the contents of the WAL to the Parquet files before shutting down.

Data storage

In InfluxDB Clustered, all measurements are stored in Apache Parquet files that represent a point-in-time snapshot of the data. The Parquet files are immutable and are never replaced nor modified. Parquet files are stored in object storage and referenced in the Catalog, which InfluxDB uses to find the appropriate Parquet files for a particular set of data.

Data deletion

When data is deleted or expires (reaches the database’s retention period), InfluxDB performs the following steps:

  1. Marks the associated Parquet files as deleted in the catalog.
  2. Filters out data marked for deletion from all queries.

Backups

InfluxDB Clustered implements the following data backup strategies:

  • Backup of WAL file: The WAL file is written on locally attached storage. If an ingester process fails, the new ingester simply reads the WAL file on startup and continues normal operation. WAL files are maintained until their contents have been written to the Parquet files in object storage. For added protection, ingesters can be configured for write replication, where each measurement is written to two different WAL files before acknowledging the write.

  • Backup of Parquet files: Parquet files are stored in object storage

  • Backup of catalog: InfluxData keeps a transaction log of all recent updates to the InfluxDB catalog and generates a daily backup of the catalog.


Was this page helpful?

Thank you for your feedback!


The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Read more

New in InfluxDB 3.4

Key enhancements in InfluxDB 3.4 and the InfluxDB 3 Explorer 1.2.

See the Blog Post

InfluxDB 3.4 is now available for both Core and Enterprise, which introduces offline token generation for use in automated deployments and configurable license type selection that lets you bypass the interactive license prompt. InfluxDB 3 Explorer 1.2 is also available, which includes InfluxDB cache management and other new features.

For more information, check out: