Back up and restore your cluster

InfluxDB Clustered automatically stores snapshots of the InfluxDB Catalog store that you can use to restore your cluster to a previous state. The snapshotting functionality is optional and is disabled by default. Enable snapshots to ensure you can recover in case of emergency.

With InfluxDB Clustered snapshots enabled, each hour, InfluxDB uses the pg_dump utility included with the InfluxDB Garbage collector to export an SQL blob or “snapshot” from the InfluxDB Catalog store to the Object store. The Catalog store is a PostgreSQL-compatible relational database that stores metadata for your time series data, such as schema data types, Parquet file locations, and more.

The Catalog store snapshots act as recovery points for your InfluxDB cluster that reference all Parquet files that existed in the Object store at the time of the snapshot. When a snapshot is restored to the Catalog store, the Compactor “soft deletes” any Parquet files not listed in the snapshot.

InfluxDB won’t hard delete Parquet files listed in any hourly or daily snapshot.

For example, if you have Parquet files A, B, C, and D, and you restore to a snapshot that includes B and C, but not A and D, then A and D are soft-deleted, but remain in object storage until they are no longer referenced in any Catalog store snapshot.

Soft delete
Hard delete
Recovery Point Objective (RPO)
Recovery Time Objective (RTO)
Data written just before a snapshot may not be present after restoring
Recommendations
- Automate object synchronization to an external S3-compatible bucket
- Enable short-term object versioning
Configure snapshots
- Environment Variables
Verify snapshots
Restore to a recovery point
Resources
- prep_pg_dump.awk

Soft delete

A soft delete refers to when, on compaction, the Compactor sets a deleted_at timestamp on the Parquet file entry in the Catalog. The Parquet file is no longer queryable, but remains intact in the object store.

Soft deletes are a mechanism of the InfluxDB Clustered Catalog, not of the underlying object storage provider. Soft deletes do not modify objects in the object store; only Catalog entries that reference objects in the object store.

Hard delete

A hard delete refers to when a Parquet file is actually deleted from object storage and no longer exists.

Recovery Point Objective (RPO)

RPO is the maximum amount of data loss (based on time) allowed after a disruptive event. It indicates how much time can pass between data snapshots before data is considered lost if a disaster occurs.

The InfluxDB Clustered snapshot strategy RPO allows for the following maximum data loss:

1 hour for hourly snapshots (up to the configured hourly snapshot expiration)
1 day for daily snapshots (up to the configured daily snapshot expiration)

Recovery Time Objective (RTO)

RTO is the maximum amount of downtime allowed for an InfluxDB cluster after a failure. RTO varies depending on the size of your Catalog store, network speeds between the client machine and the Catalog store, cluster load, the status of your underlying hosting provider, and other factors.

Data written just before a snapshot may not be present after restoring

Due to the variability of flushing data from Ingesters into Parquet files, data written in the last few minutes before a snapshot may not be included. This variability is typically less than 15 minutes, but is per table. This means that one table may have data written up to the timestamp of the snapshot, while another may not have data written in the 15 minutes prior to the snapshot. All data written more than 15 minutes prior to a snapshot should be present after restoring to that snapshot.

Recommendations

Automate object synchronization to an external S3-compatible bucket

Syncing objects to an external S3-compatible bucket ensures an up-to-date backup in case your Object store becomes unavailable. Recovery point snapshots only back up the InfluxDB Catalog store. If data referenced in a Catalog store snapshot does not exist in the Object store, the recovery process does not restore the missing data.

Enable short-term object versioning

If your object storage provider supports it, consider enabling short-term object versioning on your Object store–for example, 1-2 days to protect against errant writes or deleted objects. With object versioning enabled, as objects are updated, the object store retains distinct versions of each update that can be used to “rollback” newly written or updated Parquet files to previous versions. Keep in mind, storing versioned objects does add to object storage costs.

Configure snapshots

Use the available environment variables to enable and configure hourly Catalog snapshots in your InfluxDB cluster. Add these environment variables to the Garbage Collector configuration in your AppInstance resource:

apiVersion: kubecfg.dev/v1alpha1
kind: AppInstance
metadata:
  name: influxdb
  namespace: influxdb
spec:
  package:
    spec:
      components:
        garbage-collector:
          template:
            containers:
              iox:
                env:
                  INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES: 'true'
                  INFLUXDB_IOX_DELETE_USING_CATALOG_BACKUP_DATA_SNAPSHOT_FILES: 'true'
                  INFLUXDB_IOX_KEEP_HOURLY_CATALOG_BACKUP_FILE_LISTS: '30d'
                  INFLUXDB_IOX_KEEP_DAILY_CATALOG_BACKUP_FILE_LISTS: '90d'
                  INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF: '30d'
  
   Copy
 Fill window

Environment Variables

INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES

Enable hourly Catalog store snapshotting. The default is 'false'. Set to 'true':

INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES: 'true'
  
   Copy
 Fill window

INFLUXDB_IOX_DELETE_USING_CATALOG_BACKUP_DATA_SNAPSHOT_FILES

Enable a snapshot check when deleting files to ensure the Garbage Collector does not remove Parquet files from the object store that are associated with existing snapshots. The default is 'false'. Set to 'true':

INFLUXDB_IOX_DELETE_USING_CATALOG_BACKUP_DATA_SNAPSHOT_FILES: 'true'
  
   Copy
 Fill window

Storage utilization and costs

Enabling this setting retains Parquet files referenced in snapshots, increasing object storage utilization and costs. The longer you retain snapshots, the more storage space and costs you incur.

If set to false (the default), the Garbage Collector may delete Parquet files needed for snapshot restoration, making recovery points unusable.

INFLUXDB_IOX_KEEP_HOURLY_CATALOG_BACKUP_FILE_LISTS

After this duration of time, the Garbage Collector deletes hourly snapshots, allowing the Garbage Collector to hard-delete Parquet files from the object store and the Catalog. The default is 30d. The recommended range for snapshots is between 1d and 30d:

INFLUXDB_IOX_KEEP_HOURLY_CATALOG_BACKUP_FILE_LISTS: '30d'
  
   Copy
 Fill window

INFLUXDB_IOX_KEEP_DAILY_CATALOG_BACKUP_FILE_LISTS

After this duration of time, the Garbage Collector deletes daily snapshots, allowing the Garbage Collector to hard-delete Parquet files from the object store and the Catalog. The default is 90d. The recommended range is between 3d and 90d.

Daily snapshots must expire after hourly backups Make sure to set INFLUXDB_IOX_KEEP_DAILY_CATALOG_BACKUP_FILE_LISTS to a value greater than INFLUXDB_IOX_KEEP_HOURLY_CATALOG_BACKUP_FILE_LISTS.

INFLUXDB_IOX_KEEP_DAILY_CATALOG_BACKUP_FILE_LISTS: '90d'
  
   Copy
 Fill window

INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF

The duration of time after a Parquet file is no longer referenced in the Catalog or included in any snapshots after which the Garbage Collector removes the Parquet file from the Object store. The default is 30d:

INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF: '30d'
  
   Copy
 Fill window

For an in-depth explanation of the recommended value, see the data lifecycle garbage tuning best practices and use case examples.

Verify snapshots

InfluxDB Clustered stores hourly and daily snapshots in the /catalog_backup_file_lists path in object storage. After enabling snapshots, use clients provided by your object storage provider to ensure that snapshots are written to the object store.

Hourly snapshots are taken at approximately the beginning of each hour (≈1:00, ≈2:00, ≈3:00, etc.). After you enable snapshotting, the first snapshot is written on or around the beginning of the next hour.

Restore to a recovery point

Use the following process to restore your InfluxDB cluster to a recovery point using Catalog store snapshots:

Use the same InfluxDB Clustered version used to generate the snapshot

When restoring an InfluxDB cluster to a recovery point, use the same version of InfluxDB Clustered used to generate the Catalog store snapshot. You may need to downgrade to a previous version before restoring.

Install prerequisites:
- kubectl CLI for managing your Kubernetes deployment.
- psql CLI configured with your Data Source Name and credentials for interacting with the PostgreSQL-compatible Catalog store database.
- A client from your object storage provider for interacting with your InfluxDB cluster’s Object store.
Retrieve the recovery point snapshot from your object store.
InfluxDB Clustered stores hourly and daily snapshots in the /catalog_backup_file_lists path in object storage. Download the snapshot that you would like to use as the recovery point. If your primary Object store is unavailable, download the snapshot from your replicated Object store.
When creating and storing a snapshot, the last artifact created is the snapshot’s bloom filter. To ensure the snapshot is complete, make sure that the bloom filter file (bloom.bin.gz) exists before downloading the snapshot.
Prepare your snapshot file for the restore process.
InfluxDB Clustered snapshot pg_dump files are compressed text files containing SQL that restore the contents of the Catalog. Because your Catalog has existing data, you need to update the snapshot to prepend CREATE statements with DROP statements. The result is a slightly modified pg_dump SQL file that you can use to restore your non-empty Catalog.
If restoring to a new cluster, you do not need to update the pg_dump snapshot file.
Use the prep_pg_dump.awk script provided below to process your pg_dump file. For example:
```
gunzip pg_dump.gz
cat pg_dump | prep_pg_dump.awk > snapshot.sql
```
- Copy
- Fill window
Pause the kubit operator
The kubit operator validates cluster sizing and prevents you from disabling InfluxDB Clustered components. By pausing the kubit operator, you can disable InfluxDB components and safely perform the restore operation.
1. In your AppInstance resource, set pause to true.
```
apiVersion: kubecfg.dev/v1alpha1
kind: AppInstance
metadata:
  name: influxdb
  namespace: influxdb
spec:
  pause: true
# ...
```
  Copy
  
  Fill window
2. Apply the change to your cluster:
```
kubectl apply --filename myinfluxdb.yml --namespace influxdb
```
  Copy
  
  Fill window
Disable all InfluxDB Clustered components except the Catalog
Critical shutdown sequence
You must scale down components in the correct order and wait for each group to fully shut down before proceeding. Scaling down the catalog before ingesters have finished shutting down can cause WAL contents to survive through the restore, leading to data inconsistency and undefined behavior.
Clusters under load may take longer to shut down
If the cluster is under load, some pods may take longer to shut down. For example, Ingester pods must flush their Write-Ahead Logs (WAL) before shutting down.
1. Before scaling down, record the current number of replicas for each component to restore them to the correct scale later.
  Get the number of replicas for each pod
  echo "GC: $(kubectl get deployment global-gc -n influxdb -o jsonpath='{.spec.replicas}')" echo "Router: $(kubectl get deployment global-router -n influxdb -o jsonpath='{.spec.replicas}')" echo "Querier: $(kubectl get deployment iox-shared-querier -n influxdb -o jsonpath='{.spec.replicas}')" echo "Compactor: $(kubectl get statefulset iox-shared-compactor -n influxdb -o jsonpath='{.spec.replicas}')" echo "Ingester: $(kubectl get statefulset iox-shared-ingester -n influxdb -o jsonpath='{.spec.replicas}')" echo "Catalog: $(kubectl get statefulset iox-shared-catalog -n influxdb -o jsonpath='{.spec.replicas}')"
  
  Copy
  
  Fill window
2. Scale down non-critical components first
  Use the kubectl scale command to scale these components down to zero replicas:
```
kubectl scale --namespace influxdb --replicas=0 deployment/global-gc
kubectl scale --namespace influxdb --replicas=0 deployment/global-router
kubectl scale --namespace influxdb --replicas=0 deployment/iox-shared-querier
kubectl scale --namespace influxdb --replicas=0 statefulset/iox-shared-compactor
```
  Copy
  
  Fill window
3. Scale down ingesters and wait for complete shutdown
  Scale down the ingesters and wait for all ingester pods to fully shut down:
```
kubectl scale --namespace influxdb --replicas=0 statefulset/iox-shared-ingester
```
  Copy
  
  Fill window
  Verify that all non-Catalog pods have been removed from your cluster.
```
kubectl get pods --namespace influxdb --selector=app=iox-shared-ingester
```
  Copy
  
  Fill window
  Once removed, proceed to the next step.
Scale down catalog last
After all other pods are removed, use the kubectl scale command to scale your InfluxDB Clustered Catalog down to zero replicas:
```
kubectl scale --namespace influxdb --replicas=0 statefulset/iox-shared-catalog
```
- Copy
- Fill window
Verify that the Catalog pod has been removed from your cluster:
```
kubectl get pods --namespace influxdb --selector=app=iox-shared-catalog
```
- Copy
- Fill window
Once removed, proceed to the next step.
Restore the SQL snapshot to the Catalog
Use psql to restore the recovery point snapshot to your InfluxDB Catalog–for example:
```
psql CATALOG_DSN < snapshot.sql
```
- Copy
- Fill window
The exact psql command depends on your PostgreSQL-compatible database provider, their authentication requirements, and the database’s DSN.
Scale InfluxDB Clustered components back up
Critical startup sequence
When bringing services back online, start components in the correct order and wait for each critical component group to be fully ready before proceeding. This prevents temporary errors and ensures a clean startup.
Use the kubectl scale command to scale your InfluxDB Clustered components back up to their original number of replicas. Perform the scaling operations on components in reverse order of shutdown.
Recommended startup sequence
For optimal cluster initialization and to prevent startup errors, wait for at least 2 catalog pods to be fully ready, then wait for at least 2 ingester pods to be fully ready before scaling up the remaining components.
1. Scale catalog and wait for readiness
  Replace the number of replicas with the original values you noted when scaling down.
```
kubectl scale --namespace influxdb --replicas=3 statefulset/iox-shared-catalog
kubectl get pods --namespace influxdb --selector=app=iox-shared-catalog --watch
```
  Copy
  
  Fill window
  Wait until at least 2 catalog pods show Running status with 2/2 in the READY column.
2. Scale ingesters and wait for readiness
  Replace the number of replicas with the original values you noted when scaling down.
```
kubectl scale --namespace influxdb --replicas=3 statefulset/iox-shared-ingester
kubectl get pods --namespace influxdb --selector=app=iox-shared-ingester --watch
```
  Copy
  
  Fill window
  Wait until at least 2 ingester pods show Running status and are ready.
3. Scale remaining components
  After you have scaled the catalog and ingesters and verified they are stable, scale the remaining components.
  Replace the number of replicas with the original values you noted when scaling down.
```
kubectl scale --namespace influxdb --replicas=1 statefulset/iox-shared-compactor
kubectl scale --namespace influxdb --replicas=2 deployment/iox-shared-querier
kubectl scale --namespace influxdb --replicas=3 deployment/global-router
kubectl scale --namespace influxdb --replicas=1 deployment/global-gc
```
  Copy
  
  Fill window
Verify the restore
Verify that all InfluxDB Clustered pods are running:
```
kubectl get pods --namespace influxdb
```
- Copy
- Fill window
All pods should show Running status and be ready.

Restart the kubit operator

In your AppInstance resource, set pause to false or remove the pause field:

apiVersion: kubecfg.dev/v1alpha1
kind: AppInstance
metadata:
  name: influxdb
  namespace: influxdb
spec:
  pause: false
# ...
  
   Copy
 Fill window

Apply the change to resume the kubit operator:

kubectl apply --filename myinfluxdb.yml --namespace influxdb
  
   Copy
 Fill window

Your InfluxDB cluster is now restored to the recovery point. When the Garbage Collector runs, it identifies what Parquet files are not associated with the recovery point and soft deletes them.

Post-restore verification

After the restore completes, monitor your cluster logs and verify that:

All pods are running and ready
No error messages related to WAL inconsistencies appear
Write and query operations function correctly
The Garbage Collector operates normally

Resources

prep_pg_dump.awk

#!/usr/bin/env awk -f

# Data Snapshots in IOx use pg_dump in text output format, which is simply sql. We can apply the
# pg_dump using our standard permissions, without the need for special database create permission.
# Even a new cluster which you think is empty likely has some tables populated. For ease of
# restoring the pg_dump, this script inserts DROP statements before each CREATE statement to handle
# restoring to a non-empty catalog.
#
# The intended use of this script is to modify the pg_dump output with drop statements so it can
# be applied to a non-empty catalog.
#
# WARNING: The resulting sql is destructive. Prior catalog contents are removed and replaced with
#          what's in the pg_dump.
#
# Example use:
#    gunzip pg_dump.gz
#    cat pg_dump | prep_pg_dump.awk > clean_and_restore.sql
#    psql CATALOG_DSN < clean_and_restore.sql


BEGIN {
    print "-- Modified pg_dump text output with DROP statements"
}

# Function to clean up names (dropping trailing semicolon so CASCADE is included in the DROP command)
function clean_name(name) {
    gsub(/[";]/, "", name)
    return name
}

# Match CREATE TABLE statements and insert DROP TABLE
/^[[:space:]]*CREATE[[:space:]]+TABLE[[:space:]]+/ {
    table_name = clean_name($3)
    print "DROP TABLE IF EXISTS " table_name " CASCADE;"
    print
    next
}

# Match CREATE SCHEMA statements and insert DROP SCHEMA
/^[[:space:]]*CREATE[[:space:]]+SCHEMA[[:space:]]+/ {
    schema_name = clean_name($3)
    print "DROP SCHEMA IF EXISTS " schema_name " CASCADE;"
    print
    next
}

# Match CREATE SEQUENCE statements and insert DROP SEQUENCE
/^[[:space:]]*CREATE[[:space:]]+SEQUENCE[[:space:]]+/ {
    sequence_name = clean_name($3)
    print "DROP SEQUENCE IF EXISTS " sequence_name " CASCADE;"
    print
    next
}

# Match CREATE VIEW statements and insert DROP VIEW
/^[[:space:]]*CREATE[[:space:]]+VIEW[[:space:]]+/ {
    view_name = clean_name($3)
    print "DROP VIEW IF EXISTS " view_name " CASCADE;"
    print
    next
}

# Match CREATE FUNCTION statements and insert DROP FUNCTION
/^[[:space:]]*CREATE[[:space:]]+FUNCTION[[:space:]]+/ {
    function_name = clean_name($3)
    print "DROP FUNCTION IF EXISTS " function_name " CASCADE;"
    print
    next
}

# Match CREATE INDEX statements and insert DROP INDEX
/^[[:space:]]*CREATE[[:space:]]+INDEX[[:space:]]+/ {
    index_name = clean_name($3)
    print "DROP INDEX IF EXISTS " index_name " CASCADE;"
    print
    next
}

# Pass through all other lines
{
    print
}
  
   Copy
 Fill window

backup restore

Was this page helpful?

Thank you for your feedback!

Support and feedback

Thank you for being part of our community! We welcome and encourage your feedback and bug reports for InfluxDB Clustered and this documentation. To find support, use the following resources:

Customers with an annual or support contract can contact InfluxData Support.

Edit this page Submit docs issue Submit InfluxDB Clustered issue

Back up and restore your cluster

Soft delete

Hard delete

Recovery Point Objective (RPO)

Recovery Time Objective (RTO)

Data written just before a snapshot may not be present after restoring

Recommendations

Automate object synchronization to an external S3-compatible bucket

Enable short-term object versioning

Configure snapshots

Environment Variables

INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES

INFLUXDB_IOX_DELETE_USING_CATALOG_BACKUP_DATA_SNAPSHOT_FILES

Storage utilization and costs

INFLUXDB_IOX_KEEP_HOURLY_CATALOG_BACKUP_FILE_LISTS

INFLUXDB_IOX_KEEP_DAILY_CATALOG_BACKUP_FILE_LISTS

INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF

Verify snapshots

Restore to a recovery point

Use the same InfluxDB Clustered version used to generate the snapshot

Critical shutdown sequence

Clusters under load may take longer to shut down

Resources

prep_pg_dump.awk

Support and feedback

The future of Flux

New in InfluxDB 3.3

Back up and restore your cluster

What is your InfluxDB cluster URL?

Enter cluster URL

Thank you for your feedback!

The future of Flux

New in InfluxDB 3.3