Join data
Use the Flux join
package to join two data sets based on common values.
Learn how join two data sets using the following join methods:
When to use the join package
We recommend using the join
package to join streams that have mostly different
schemas or that come from two separate data sources.
If you’re joining data from the same data source with the same schema, using
union()
and pivot()
to combine the data will likely be more performant.
For more information, see When to use union and pivot instead of join functions.
How join functions work
join
functions join two streams of tables together based
on common values in each input stream.
Input streams
Each input stream is assigned to the left
or right
parameter.
Input streams can be defined from any valid data source.
For more information, see:
- Query data sources
- Define ad hoc tables with
array.from()
Data requirements
To join data, each input stream must have the following:
One or more columns with common values to join on.
Columns do not need identical labels, but they do need to have comparable values.Identical group keys.
Functions in thejoin
package use group keys to quickly determine what tables from each input stream should be paired and evaluated for the join operation.
Both input streams should have the same group key. If they don’t, your join operation may not find any matching tables and will return unexpected output. If the group keys of your input streams are not identical, usegroup()
to regroup each input stream before joining them together.Only tables with the same group key instance are joined.
Join predicate function (on)
join
package functions require the on
parameter to compare values from each input stream (represented by l
(left) and r
(right))
and returns true
or false
.
Rows that return true
are joined.
This parameter is a predicate function.
(l, r) => l.column == r.column
Join output function (as)
join
package functions (except join.time()
)
require the as
parameter to define the output schema of the join.
The as
parameter returns a new record using values from
joined rows–left (l
) and right (r
).
(l, r) => ({l with name: r.name, location: r.location})
Do not modify group key columns
Do not modify group key columns. The as
function must return the same group key as both input streams to successfully perform a join.
Perform join operations
The join
package supports the following join types and special use cases:
- Perform an inner join
- Perform a left outer join
- Perform a right outer join
- Perform a full outer join
- Join on time
Perform an inner join
Use join.inner()
to perform an inner join of two streams of data. Inner joins drop any rows from both input streams that do not have a matching row in the other stream.
import "join"
left = from(bucket: "example-bucket-1") |> //...
right = from(bucket: "example-bucket-2") |> //...
join.inner(
left: left,
right: right,
on: (l, r) => l.column == r.column,
as: (l, r) => ({l with name: r.name, location: r.location}),
)
Perform a left outer join
Use join.left()
to perform an outer left join of two streams of data. Left joins output a row for each row in the left data stream with data matching from the right data stream. If there is no matching data in the right data stream, non-group-key columns with values from the right data stream are null.
import "join"
left = from(bucket: "example-bucket-1") |> //...
right = from(bucket: "example-bucket-2") |> //...
join.left(
left: left,
right: right,
on: (l, r) => l.column == r.column,
as: (l, r) => ({l with name: r.name, location: r.location}),
)
Perform a right outer join
Use join.right()
to perform an right outer join of two streams of data. Right joins output a row for each row in the right data stream with data matching from the left data stream. If there is no matching data in the left data stream, non-group-key columns with values from the left data stream are null.
import "join"
left = from(bucket: "example-bucket-1") |> //...
right = from(bucket: "example-bucket-2") |> //...
join.right(
left: left,
right: right,
on: (l, r) => l.column == r.column,
as: (l, r) => ({r with name: l.name, location: l.location}),
)
Perform a full outer join
Use join.full()
to perform an full outer join of two streams of data. Full outer joins output a row for all rows in both the left and right input streams and join rows that match according to the on
predicate.
import "join"
left = from(bucket: "example-bucket-1") |> //...
right = from(bucket: "example-bucket-2") |> //...
join.full(
left: left,
right: right,
on: (l, r) => l.id== r.id,
as: (l, r) => {
id = if exists l.id then l.id else r.id
return {name: l.name, location: r.location, id: id}
},
)
Join on time
Use join.time()
to join two streams of data based on time values in the _time
column. This type of join operation is common when joining two streams of time series data.
import "join"
left = from(bucket: "example-bucket-1") |> //...
right = from(bucket: "example-bucket-2") |> //...
join.time(
left: left,
right: right,
as: (l, r) => ({l with field1: l._value, field2: r._value_}),
)
Troubleshoot join operations
For information about unexpected behaviors and errors when using the join
package,
see Troubleshoot join operations.
Was this page helpful?
Thank you for your feedback!
Support and feedback
Thank you for being part of our community! We welcome and encourage your feedback and bug reports for Flux and this documentation. To find support, use the following resources:
Customers with an annual or support contract can contact InfluxData Support.