InfluxDB Clustered data durability
How data flows through InfluxDB Clustered
When data is written to InfluxDB Clustered, it progresses through multiple stages to ensure durability, optimized performance and storage, and efficient querying. Configuration options at each stage affect system behavior, balancing reliability and resource usage.
Figure: Write request, response, and ingest flow for InfluxDB Clustered
Data ingest
Write validation and memory buffer
The Router validates incoming data to prevent malformed or unsupported data from entering the system. InfluxDB Clustered writes accepted data to multiple write-ahead log (WAL) files on Ingester pods’ local storage (default is 2 for redundancy) before acknowledging the write request. The Ingester holds the data in memory to ensure leading-edge data is available for querying.
Write-ahead log (WAL) persistence
Ingesters persist the contents of the WAL to Parquet files in object storage and updates the Catalog to reference the newly created Parquet files. InfluxDB Clustered retains WALs until the data is persisted.
If an Ingester node is gracefully shut down (for example, during a new software deployment), it flushes the contents of the WAL to the Parquet files before shutting down.
Data storage
In InfluxDB Clustered, all measurements are stored in Apache Parquet files that represent a point-in-time snapshot of the data. The Parquet files are immutable and are never replaced nor modified. Parquet files are stored in object storage and referenced in the Catalog, which InfluxDB uses to find the appropriate Parquet files for a particular set of data.
Data deletion
When data is deleted or expires (reaches the database’s retention period), InfluxDB performs the following steps:
- Marks the associated Parquet files as deleted in the catalog.
- Filters out data marked for deletion from all queries.
Backups
InfluxDB Clustered implements the following data backup strategies:
Backup of WAL file: The WAL file is written on locally attached storage. If an ingester process fails, the new ingester simply reads the WAL file on startup and continues normal operation. WAL files are maintained until their contents have been written to the Parquet files in object storage. For added protection, ingesters can be configured for write replication, where each measurement is written to two different WAL files before acknowledging the write.
Backup of Parquet files: Parquet files are stored in object storage
Backup of catalog: InfluxData keeps a transaction log of all recent updates to the InfluxDB catalog and generates a daily backup of the catalog.
Was this page helpful?
Thank you for your feedback!
Support and feedback
Thank you for being part of our community! We welcome and encourage your feedback and bug reports for InfluxDB Clustered and this documentation. To find support, use the following resources:
Customers with an annual or support contract can contact InfluxData Support.