InfluxDB 3 storage engine architecture
The InfluxDB 3 storage engine is a real-time, columnar database optimized for time series data built in Rust on top of Apache Arrow and DataFusion. It supports infinite tag cardinality (number of unique tag values), real-time queries, and is optimized to reduce storage cost.
Storage engine diagram
Storage engine components
Router
The Router (also known as the Ingest Router) parses incoming line protocol and then routes it to Ingesters. The Router processes incoming write requests through the following steps:
- Queries the Catalog to determine persistence locations and verify schema compatibility
- Validates syntax and schema compatibility for each data point in the request, and either accepts or rejects points
- Returns a response to the client
- Replicates data to two or more available Ingesters for write durability
Ingester
The Ingester processes line protocol submitted in write requests and persists time series data to the Object store. In this process, the Ingester does the following:
- Processes line protocol and persists time series data to the Object store in Apache Parquet format. Each Parquet file represents a partition–a logical grouping of data.
- Makes yet-to-be-persisted data available to Queriers to ensure leading edge data is included in query results.
- Maintains a short-term write-ahead log (WAL) to prevent data loss in case of a service interruption.
Querier
The Querier handles query requests and returns query results for requests. It supports both SQL and InfluxQL through Apache Arrow DataFusion.
Query life cycle
At query time, the querier:
Receives the query request and builds a query plan.
Queries the Ingesters to:
- ensure the schema assumed by the query plan matches the schema of written data
- include recently written, yet-to-be-persisted data in query results
Queries the Catalog service to retrieve Catalog store information about partitions in the Object store that contain the queried data.
Retrieves any needed Parquet files (not already cached) from the Object store.
Reads partition Parquet files that contain the queried data and scans each row to filter data that matches predicates in the query plan.
Performs any additional operations (for example: deduplicating, merging, and sorting) specified in the query plan.
Returns the query result to the client.
Catalog
InfluxDB’s catalog system consists of two distinct components: the Catalog store and the Catalog service.
Managing Catalog components
The Catalog service is managed through the AppInstance
resource, while the Catalog store
is managed separately according to your PostgreSQL implementation.
Catalog store
The Catalog store is a PostgreSQL-compatible relational database that stores metadata related to your time series data including schema information and physical locations of partitions in the Object store. It fulfills the following roles:
- Provides information about the schema of written data.
- Tells the Ingester what partitions to persist data to.
- Tells the Querier what partitions contain the queried data.
Catalog service
The Catalog service (iox-shared-catalog statefulset) is an IOx component that caches and manages access to the Catalog store.
Object store
The Object store contains time series data in Apache Parquet format. Data in each Parquet file is sorted, encoded, and compressed. A partition may contain multiple parquet files which are subject to compaction. By default, InfluxDB partitions tables by day, but you can customize the partitioning strategy
Compactor
The Compactor processes and compresses partitions in the Object store to continually optimize storage. It then updates the Catalog with locations of compacted data.
Garbage collector
The Garbage collector runs background jobs that evict expired or deleted data, remove obsolete compaction files, and reclaim space in both the Catalog and the Object store.
Was this page helpful?
Thank you for your feedback!
Support and feedback
Thank you for being part of our community! We welcome and encourage your feedback and bug reports for InfluxDB Clustered and this documentation. To find support, use the following resources:
Customers with an annual or support contract can contact InfluxData Support.