Use the PyArrow library to analyze data
Use PyArrow to read and analyze query results from InfluxDB Cloud Dedicated. The PyArrow library provides efficient computation, aggregation, serialization, and conversion of Arrow format data.
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to store, process and move data fast.
The Arrow Python bindings (also named “PyArrow”) have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.
- Install prerequisites
- Use PyArrow to read query results
- Use PyArrow to analyze data
The examples in this guide assume using a Python virtual environment and the InfluxDB v3
influxdb3-python Python client library.
For more information, see how to get started using Python to query InfluxDB.
influxdb3-python also installs the
pyarrow library that provides Python bindings for Apache Arrow.
Use PyArrow to read query results
The following example shows how to use
pyarrow to query InfluxDB and view Arrow data as a PyArrow
In your editor, copy and paste the following sample code to a new file–for example,
# pyarrow-example.py from influxdb_client_3 import InfluxDBClient3 import pandas def querySQL(): # Instantiate an InfluxDB client configured for a database client = InfluxDBClient3( "https://cluster-id.influxdb.io", database="DATABASE_NAME", token="DATABASE_TOKEN") # Execute the query to retrieve all record batches in the stream formatted as a PyArrow Table. table = client.query( '''SELECT * FROM home WHERE time >= now() - INTERVAL '90 days' ORDER BY time''' ) client.close() print(querySQL())
Replace the following configuration values:
In your terminal, use the Python interpreter to run the file:
InfluxDBClient3.query() method sends the query request, and then returns a
pyarrow.Table that contains all the Arrow record batches from the response stream.
Next, use PyArrow to analyze data.
Use PyArrow to analyze data
Group and aggregate data
pyarrow.Table, you can use values in a column as keys for grouping.
The following example shows how to query InfluxDB, and then use PyArrow to group the table data and calculate an aggregate value for each group:
# pyarrow-example.py from influxdb_client_3 import InfluxDBClient3 import pandas def querySQL(): # Instantiate an InfluxDB client configured for a database client = InfluxDBClient3( "https://cluster-id.influxdb.io", database="DATABASE_NAME", token="DATABASE_TOKEN") # Execute the query to retrieve data # formatted as a PyArrow Table table = client.query( '''SELECT * FROM home WHERE time >= now() - INTERVAL '90 days' ORDER BY time''' ) client.close() return table table = querySQL() # Use PyArrow to aggregate data print(table.group_by('room').aggregate([('temp', 'mean')]))
Replace the following:
DATABASE_TOKEN: An InfluxDB token with read permissions on the databases you want to query.
DATABASE_NAME: The name of the InfluxDB database to query.
Was this page helpful?
Thank you for your feedback!
Support and feedback
Thank you for being part of our community! We welcome and encourage your feedback and bug reports for InfluxDB and this documentation. To find support, use the following resources:
Customers with an annual or support contract can contact InfluxData Support.