Documentation

Use the PyArrow library to analyze data

Use PyArrow to read and analyze query results from a InfluxDB Cloud Serverless. The PyArrow library provides efficient computation, aggregation, serialization, and conversion of Arrow format data.

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to store, process and move data fast.

The Arrow Python bindings (also named “PyArrow”) have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.

Install prerequisites

The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python. For more information, see how to get started using Python to query InfluxDB

Installing flightsql-dbapi also installs the pyarrow library that provides Python bindings for Apache Arrow.

Use PyArrow to read query results

The following example shows how to use Python with flightsql-dbapi and pyarrow to query InfluxDB and view Arrow data as a PyArrow Table.

  1. In your editor, copy and paste the following sample code to a new file–for example, pyarrow-example.py:

    # pyarrow-example.py
    
    from flightsql import FlightSQLClient
    
    # Instantiate a FlightSQLClient configured for a bucket
    client = FlightSQLClient(host='cloud2.influxdata.com',
        token='INFLUX_READ_WRITE_TOKEN',
        metadata={'database': 'BUCKET_NAME'},
        features={'metadata-reflection': 'true'})
    
    # Execute the query to retrieve FlightInfo
    info = client.execute('SELECT * FROM home')
    
    # Use the ticket to request the Arrow data stream.
    # Return a FlightStreamReader for streaming the results.
    reader = client.do_get(info.endpoints[0].ticket)
    
    # Read all data to a pyarrow.Table
    table = reader.read_all()
    
    print(table)
    
  2. Replace the following configuration values:

    • INFLUX_READ_WRITE_TOKEN: An InfluxDB token with read permission to the bucket.
    • BUCKET_NAME: The name of the InfluxDB bucket to query.
  3. In your terminal, use the Python interpreter to run the file:

    python pyarrow-example.py
    

The FlightStreamReader.read_all() method reads all Arrow record batches in the stream as a pyarrow.Table.

Next, use PyArrow to analyze data.

Use PyArrow to analyze data

Group and aggregate data

With a pyarrow.Table, you can use values in a column as keys for grouping.

The following example shows how to query InfluxDB, group the table data, and then calculate an aggregate value for each group:

# pyarrow-example.py

from flightsql import FlightSQLClient

client = FlightSQLClient(host='cloud2.influxdata.com',
    token='INFLUX_READ_WRITE_TOKEN',
    metadata={'database': 'BUCKET_NAME'},
    features={'metadata-reflection': 'true'})

info = client.execute('SELECT * FROM home')

reader = client.do_get(info.endpoints[0].ticket)

table = reader.read_all()

# Use PyArrow to aggregate data
print(table.group_by('room').aggregate([('temp', 'mean')]))

View example results

Replace the following:

  • INFLUX_READ_WRITE_TOKEN: An InfluxDB token with read permission to the bucket.
  • BUCKET_NAME: The name of the InfluxDB bucket to query.

For more detail and examples, see the PyArrow documentation and the Apache Arrow Python Cookbook.


Was this page helpful?

Thank you for your feedback!


Introducing InfluxDB 3.0

The new core of InfluxDB built with Rust and Apache Arrow. Available today in InfluxDB Cloud Dedicated.

Learn more

State of the InfluxDB Cloud Serverless documentation

The new documentation for InfluxDB Cloud Serverless is a work in progress. We are adding new information and content almost daily. Thank you for your patience!

If there is specific information you’re looking for, please submit a documentation issue.

InfluxDB Cloud Serverless powered by IOx