Flux data model

Note: This document is a living document and may not represent the current implementation of Flux. Any section that is not currently implemented is commented with a [IMPL#XXX] where XXX is an issue number tracking discussion and progress towards implementation.

Flux employs a basic data model built from basic data types. The data model consists of tables, records, columns and streams.

Record

A record is a tuple of named values and is represented using an object type.

Column

A column has a label and a data type. The available data types for a column are:

Data type Description
bool A boolean value, true or false.
uint An unsigned 64-bit integer.
int A signed 64-bit integer.
float An IEEE-754 64-bit floating-point number.
string A sequence of unicode characters.
bytes A sequence of byte values.
time A nanosecond precision instant in time.
duration A nanosecond precision duration of time.

Table

A table is set of records with a common set of columns and a group key.

The group key is a list of columns. A table’s group key denotes which subset of the entire dataset is assigned to the table. All records within a table will have the same values for each column that is part of the group key. These common values are referred to as the “group key value” and can be represented as a set of key value pairs.

A tables schema consists of its group key and its columns’ labels and types.

IMPL#463 Specify the primitive types that make up stream and table types

Stream of tables

A stream represents a potentially unbounded set of tables. A stream is grouped into individual tables using their respective group keys. Tables within a stream each have a unique group key value.

IMPL#463 Specify the primitive types that make up stream and table types

Missing values (null)

null is a predeclared identifier representing a missing or unknown value. null is the only value comprising the null type. Any non-boolean operator that operates on basic types returns null when at least one of its operands is null.

Think of null as an unknown value. The following table explains how null values behave in expressions:

Expression Evaluates To Because
null + 5 null Adding 5 to an unknown value is still unknown
null * 5 null Multiplying an unknown value by 5 is still unknown
null == 5 null We don’t know if an unknown value is equal to 5
null < 5 null We don’t know if an unknown value is less than 5
null == null null We don’t know if something unknown is equal to something else that is also unknown

Operating on something unknown produces something that is still unknown. The only place where this is not the case is in boolean logic. Because boolean types are nullable, Flux implements ternary logic as a way of handling boolean operators with null operands. By interpreting a null operand as an unknown value, we have the following definitions:

  • not null = null
  • null or false = null
  • null or true = true
  • null or null = null
  • null and false = false
  • null and true = null
  • null and null = null

Because records are represented using object types, attempting to access a column whose value is unknown or missing from a record will also return null.

IMPL#723 Design how nulls behave According to the definitions above, it is not possible to check if an expression is null using the == and != operators. These operators will return null if any of their operands are null. In order to perform such a check, Flux provides a built-in exists operator:

  • exists x returns false if x is null
  • exists x returns true if x is not null

Transformations

Transformations define a change to a stream. Transformations may consume an input stream and always produce a new output stream. The output stream group keys have a stable output order based on the input stream. Specific ordering may change between releases, but is not considered a breaking change.

Most transformations output one table for every table they receive from the input stream. Transformations that modify group keys or values regroup the tables in the output stream. A transformation produces side effects when constructed from a function that produces side effects.

Transformations are represented using function types.

This documentation is open source. See a typo? Please, open an issue.


Need help getting up and running? Get Support