---
title: Use the PyArrow library to analyze data
description: Use PyArrow to read and analyze InfluxDB query results from InfluxDB Cloud Serverless.
url: https://docs.influxdata.com/influxdb3/cloud-serverless/process-data/tools/pyarrow/
estimated_tokens: 3291
product: InfluxDB Cloud Serverless
version: cloud-serverless
---

# Use the PyArrow library to analyze data

Use [PyArrow](https://arrow.apache.org/docs/python/) to read and analyze query results from InfluxDB Cloud Serverless. The PyArrow library provides efficient computation, aggregation, serialization, and conversion of Arrow format data.

> Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to store, process and move data fast.
> 
> The Arrow Python bindings (also named “PyArrow”) have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.
> 
> [PyArrow documentation](https://arrow.apache.org/docs/python/index.html)

-   [Install prerequisites](#install-prerequisites)
-   [Use PyArrow to read query results](#use-pyarrow-to-read-query-results)
-   [Use PyArrow to analyze data](#use-pyarrow-to-analyze-data)
    -   [Group and aggregate data](#group-and-aggregate-data)

## Install prerequisites

The examples in this guide assume using a Python virtual environment and the InfluxDB 3 [`influxdb3-python` Python client library](/influxdb3/cloud-serverless/reference/client-libraries/v3/python/). For more information, see how to [get started using Python to query InfluxDB](/influxdb3/cloud-serverless/query-data/execute-queries/flight-sql/python/).

Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.

## Use PyArrow to read query results

The following example shows how to use `influxdb3-python` and `pyarrow` to query InfluxDB and view Arrow data as a PyArrow `Table`.

1. In your editor, copy and paste the following sample code to a new file–for example, `pyarrow-example.py`:
    
    ```py
    # pyarrow-example.py
    
    from influxdb_client_3 import InfluxDBClient3
    import pandas
    
    def querySQL():
      
      # Instantiate an InfluxDB client configured for a bucket
      client = InfluxDBClient3(
        "https://cloud2.influxdata.com",
        database="BUCKET_NAME",
        token="API_TOKEN")
    
      # Execute the query to retrieve all record batches in the stream formatted as a PyArrow Table.
      table = client.query(
        '''SELECT *
          FROM home
          WHERE time >= now() - INTERVAL '90 days'
          ORDER BY time'''
      )
    
      client.close()
    
    print(querySQL())
    ```
    
2. Replace the following configuration values:
    
    -   `API_TOKEN`: An InfluxDB [token](/influxdb3/cloud-serverless/admin/tokens/) with read permissions on the buckets you want to query.
    -   `BUCKET_NAME`: The name of the InfluxDB [bucket](/influxdb3/cloud-serverless/admin/buckets/) to query.
3. In your terminal, use the Python interpreter to run the file:
    
    ```sh
    python pyarrow-example.py
    ```
    

The `InfluxDBClient3.query()` method sends the query request, and then returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream.

Next, [use PyArrow to analyze data](#use-pyarrow-to-analyze-data).

## Use PyArrow to analyze data

### Group and aggregate data

With a `pyarrow.Table`, you can use values in a column as *keys* for grouping.

The following example shows how to query InfluxDB, and then use PyArrow to group the table data and calculate an aggregate value for each group:

```py
# pyarrow-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

def querySQL():
  
  # Instantiate an InfluxDB client configured for a bucket
  client = InfluxDBClient3(
    "https://cloud2.influxdata.com",
    database="BUCKET_NAME",
    token="API_TOKEN")

  # Execute the query to retrieve data 
  # formatted as a PyArrow Table
  table = client.query(
    '''SELECT *
      FROM home
      WHERE time >= now() - INTERVAL '90 days'
      ORDER BY time'''
  )

  client.close()

  return table

table = querySQL()

# Use PyArrow to aggregate data
print(table.group_by('room').aggregate([('temp', 'mean')]))
```

Replace the following:

-   `API_TOKEN`: An InfluxDB [token](/influxdb3/cloud-serverless/admin/tokens/) with read permissions on the buckets you want to query.
-   `BUCKET_NAME`: The name of the InfluxDB [bucket](/influxdb3/cloud-serverless/admin/tokens/) to query.

[](#view-example-results)

View example results

```arrow
pyarrow.Table
temp_mean: double
room: string
----
temp_mean: [[22.581987577639747,22.10807453416151]]
room: [["Kitchen","Living Room"]]
```

For more detail and examples, see the [PyArrow documentation](https://arrow.apache.org/docs/python/getstarted.html) and the [Apache Arrow Python Cookbook](https://arrow.apache.org/cookbook/py/data.html).

#### Related

-   [Use pandas to analyze and visualize data](/influxdb3/cloud-serverless/process-data/tools/pandas/)
-   [Query data with SQL](/influxdb3/cloud-serverless/query-data/sql/)
-   [Use Python to query data](/influxdb3/cloud-serverless/query-data/execute-queries/client-libraries/python/)
