Differences Between InfluxDB 0.8 and InfluxDB 0.10

Warning! This page documents an old version of InfluxDB, which is no longer actively developed. InfluxDB v1.3 is the most recent stable version of InfluxDB.

There are significant breaking changes between the 0.8.x versions of InfluxDB and the 0.10.x versions. The API, the schema, the data store, and the clustering algorithm have all been significantly altered and in incompatible ways. The InfluxQL syntax has remained largely the same, although there are functions which exist only in one or the other version, and a few functions are implemented differently in the two versions.

This page aims to ease the transition to InfluxDB 0.10 for those more familiar with 0.8. It is not a comprehensive list of the differences, that would be prohibitively large.

Schema

The most noticeable difference between InfluxDB 0.8 and InfluxDB 0.10 is the way series are defined and organized. In InfluxDB 0.8 a series has an explicit name that incorporates all the metadata about the value, similar to Graphite output. The recommended pattern for a series name in InfluxDB 0.8 is <tagName>.<tagValue>[.<tagName>.<tagValue>].<measurement>. For example: az.us-west-1.host.serverA.cpu.

In InfluxDB 0.10 a series is still defined by the measurement name and the complete tag set, but series are not explicitly defined at write time. Instead of writing a point to a particular series, a point is written with a measurement name and a set of tag key-value pairs. The recommended pattern in InfluxDB 0.10 is superficially similar: <measurement>,<tagName>=<tagValue>[,<tagName>=<tagValue>]. For example: cpu,az=us-west-1,host=serverA.

The real change is evident when querying. In InfluxDB 0.8, querying multiple series requires using a regular expression. For example, to query the cpu measurement where the az is us-west-1 for all hostnames, the syntax is SELECT * FROM /az.us-west-1.host..*.cpu$/ .... To query all series in the cpu measurement the syntax is SELECT * FROM /cpu$/ ....

In InfluxDB 0.10, queries operate directly on measurements, not on series. All series in a measurement are returned by default, unless filtered out by tags in the WHERE clause. For example, to query the cpu measurement where the az is us-west-1 for all hostnames, the syntax is SELECT * FROM cpu WHERE az='us-west-1' .... To query all series in the cpu measurement the syntax is SELECT * FROM cpu ....

The change means that as series incorporate additional tags all queries will continue to function as expected. Because all series are returned unless explicitly filtered, new tags do not have any effect on existing queries. The existing queries do not include the new tag and therefore will ignore its existence when selecting results. No longer does schema design need to include a prolonged session of predicting all future metadata and ensuring it can be represented within the desired regular expression syntax. In InfluxDB 0.10 filtering by tags is very efficient, whereas in InfluxDB 0.8 filtering by column in the WHERE clause is always an expensive operation.

In InfluxDB 0.8 a series has a name and one or more measured values. The values can be passed to functions or used in a GROUP BY clause. The series name is indexed but the values are not. In InfluxDB 0.10 a series has a measurement name, a tag set, and a field set. The measurement name plus the tag set are equivalent to the series name from InfluxDB 0.8. The field set in InfluxDB 0.10 is equivalent to the series values in InfluxDB 0.8.

When designing a schema for InfluxDB 0.10 is it important to understand the differences between tags and fields. * Tags are indexed, fields are not. Both can be used as filters in the WHERE clause, but filtering by tags only requires a quick index lookup. Filtering by fields requires a full scan of all points on disk that match the other filters and is thus much less performant. * Fields can be passed into functions but tags cannot. * Tags can be used in a GROUP BY clause but fields cannot. (In InfluxDB 0.8 series values – equivalent to fields – can be used in the GROUP BY clause.) * Measurement names, tag keys, and tag values can be matched using regular expressions but regular expressions are not valid for matching field keys or field values.

With prior storage engines, requesting a single field required retrieving the entire field set from disk. With TSM it is possible to retrieve a partial field set from a point. This makes it more efficient to store multiple fields per point, and will improve both write and read throughput. Telegraf plugins have been updated to reflect this change, using multiple fields per point with a single measurement per plugin, rather than a single field per point across multiple measurements per plugin.

Joins

InfluxDB 0.8 allows both MERGE and JOIN operations between series. The MERGE operation returns a single aggregation of two or more series. The JOIN operation allows values from two different series to appear in the same select statement. JOIN is often used to perform mathematical operations on values from two different series, such as dividing two values to find a ratio. For example: SELECT errors_per_minute.value / pages_per_minute.value FROM errors_per_minute INNER JOIN pages_per_minute.

In InfluxDB 0.10 neither the MERGE nor JOIN operations are supported. The MERGE operation is no longer needed. All series within a measurement are automatically merged when queried, unless explicitly excluded by tags in the WHERE clause.

The lack of a JOIN operation means that fields from different measurements cannot be directly referenced in the same SELECT clause. It is no longer possible to run a query like SELECT errors_per_minute.value / pages_per_minute.value FROM errors_per_minute INNER JOIN pages_per_minute. It is possible to query more than one measurement at once using regular expressions. However, every field referenced in the query must be present in each measurement and cannot be filtered by measurement.

Data Retention

In InfluxDB 0.8 data retention is handled by shard groups. Shard groups include a duration for each individual shard within the group and when the duration is exceeded, the shard may be dropped. A given series writes to the first shard group whose regular expression matches the series name. Shards and shard groups are directly exposed in the Admin UI and can be manipulated by an end user.

In InfluxDB 0.10 it is no longer necessary to parse or define regular expressions to determine the retention of a particular point. Instead, data retention is handled by retention policies. Retention policies belong to databases and define the duration for each point written to the retention policy. When the duration is exceeded, the underlying storage file is marked to be dropped. A point is written directly to a particular retention policy, declared at write time. If no retention policy is explicitly provided with the write, the point is written to the DEFAULT retention policy. In InfluxDB 0.10 shards and shard groups are purely internal and have no user-facing features or settings.

Points With Identical Timestamps

In InfluxDB 0.8 a point is uniquely identified by the series name, timestamp, and sequence number. The sequence number is an internally generated ID that provides uniqueness in the event the series name and timestamp are identical.

In InfluxDB 0.10 a point is uniquely identified by the measurement name, full tag set, and the timestamp. Only one point can be stored for a given set of those values. If you write a new point with the same measurement, tag set, and timestamp as an old point, the field set becomes the union of the old field set and the new field set, where any ties go to the new field set. See Frequently Encountered Issues for an example. If you need to store multiple points with the same measurement name, timestamp, and field, they must have different tag sets. This can be accomplished by adding a new tag with different values for each point that is otherwise identical. Uniqueness tags used in this manner will not significantly affect write or query performance.

Write Protocols

In InfluxDB 0.8 the primary write method is a JSON protocol. In InfluxDB 0.10 the JSON write protocol is deprecated and replaced by the line protocol.

API Endpoints

InfluxDB 0.8 has multiple endpoints for the API, often encoding concepts such as the database and shard space in the HTTP endpoint. InfluxDB 0.10 simplifies the common API interactions, using a common endpoint /query for all queries, and /write for all writes, regardless of database or retention policy. The database and retention policy are passed as query string parameters in InfluxDB 0.10.

This makes the API easier to incorporate into libraries and other code, as it does not require any knowledge of the data structure to make valid calls to the API endpoints.

Command Line Interface

InfluxDB 0.10 has a command line interface (CLI) making it easier to interact with the database. Rather than rely on the Admin UI or manually crafting HTTP requests, InfluxDB 0.10 users can send writes and queries in a pseudo-interactive shell, with query output available in multiple formats, including JSON, CSV, and columnar.

The CLI does not interact directly with the influxd process, it is a wrapper around the Go client and uses the same HTTP API as any other client or library.

Admin UI

In InfluxDB 0.8 the Administration User Interface supported a number of configuration and administration features, some of which could not be performed any other way. In InfluxDB 0.10 all database administration, configuration, and information is available via the HTTP API and the Admin UI has been substantially simplified.

InfluxQL re-written

As a result of the new schema, the query language was substantially overhauled. A number of functions available in InfluxDB 0.8 have not yet been re-implemented in InfluxDB 0.10, including MODE, HISTOGRAM, DIFFERENCE. Links go to the open issues to re-implement each function or behavior.

The query parser in InfluxDB 0.8 was implemented with a C version of Yacc and Bison. Re-writing the query parser in Go means the code is much more portable to systems other than Linux on i686, such as ARM or Windows builds.

Continuous Query Engine Redone

In InfluxDB 0.10 the CREATE CONTINUOUS QUERY statement also controls CQ execution. By default, CQs run at the same interval as the CQ’s GROUP BY time() interval, and the system calculates the query for the most recent GROUP BY time() interval. A new and optional RESAMPLE clause allows users to specify how often the CQ runs and the time range over which InfluxDB runs the CQ.

Clustering

InfluxDB 0.10 follows a hybrid model for maintaining availability of the data while guaranteeing consistency of the metadata. See CEO and co-founder Paul Dix’s clustering blog post for more details. InfluxDB 0.10 supports hybrid nodes, data nodes, and consensus nodes. This feature allows users to separate out consensus nodes in very large clusters or configurations where data nodes are under heavy load. Clustering in InfluxDB 0.10 is not feature complete.

Data Storage Engine

In InfluxDB 0.8 users could choose from a selection of underlying storage engines, with RocksDB, LevelDB, HyperLevelDB, and LMDB as available options. With InfluxDB 0.10 the purpose-built Time Structured Merge tree (TSM) is the only storage engine available. Selecting a storage engine written in Go simplifies the build process, making the code more portable to multiple architectures and operating systems. In addition, TSM greatly simplifies the overhead of internal file handles, a significant source of issues in the InfluxDB 0.8 version.