---
title: Use pandas to analyze data
description: Use the pandas Python data analysis library to analyze and visualize time series data stored in InfluxDB Clustered.
url: https://docs.influxdata.com/influxdb3/clustered/process-data/tools/pandas/
estimated_tokens: 5728
product: InfluxDB Clustered
version: clustered
---

# Use pandas to analyze data

Use [pandas](https://pandas.pydata.org/), the Python data analysis library, to process, analyze, and visualize data stored in an InfluxDB Clustered database.

> **pandas** is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
> 
> [pandas documentation](https://pandas.pydata.org/docs/)

-   [Install prerequisites](#install-prerequisites)
-   [Install pandas](#install-pandas)
-   [Use PyArrow to convert query results to pandas](#use-pyarrow-to-convert-query-results-to-pandas)
-   [Use pandas to analyze data](#use-pandas-to-analyze-data)
    -   [View data information and statistics](#view-data-information-and-statistics)
    -   [Downsample time series](#downsample-time-series)

## Install prerequisites

The examples in this guide assume using a Python virtual environment and the InfluxDB 3 [`influxdb3-python` Python client library](/influxdb3/clustered/reference/client-libraries/v3/python/). For more information, see how to [get started using Python to query InfluxDB](/influxdb3/clustered/query-data/execute-queries/client-libraries/python/).

Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.

## Install pandas

To use pandas, you need to install and import the `pandas` library.

In your terminal, use `pip` to install `pandas` in your active [Python virtual environment](/influxdb3/clustered/query-data/execute-queries/client-libraries/python/#create-a-project-virtual-environment):

```sh
pip install pandas
```

## Use PyArrow to convert query results to pandas

The following steps use Python, `influxdb3-python`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`.

1. In your editor, copy and paste the following code to a new file–for example, `pandas-example.py`:
    
    ```py
    # pandas-example.py
    
    from influxdb_client_3 import InfluxDBClient3
    import pandas
    
    # Instantiate an InfluxDB client configured for a database
    client = InfluxDBClient3(
      "https://cluster-host.com",
      database="DATABASE_NAME",
      token="DATABASE_TOKEN")
    
    # Execute the query to retrieve all record batches in the stream
    # formatted as a PyArrow Table.
    table = client.query(
      '''SELECT *
        FROM home
        WHERE time >= now() - INTERVAL '90 days'
        ORDER BY time'''
    )
    
    client.close()
    
    # Convert the PyArrow Table to a pandas DataFrame.
    dataframe = table.to_pandas()
    
    print(dataframe)
    ```
    
2. Replace the following configuration values:
    
    -   `DATABASE_NAME`: the name of the [database](/influxdb3/clustered/admin/databases/) to query
    -   `DATABASE_TOKEN`: a [database token](/influxdb3/clustered/admin/tokens/#database-tokens) with *read* permission on the specified database
3. In your terminal, use the Python interpreter to run the file:
    
    ```sh
    python pandas-example.py
    ```
    

The example calls the following methods:

-   [`InfluxDBClient3.query()`](/influxdb3/clustered/reference/client-libraries/v3/python/#influxdbclient3query): sends the query request and returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream.
    
-   [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas): Creates a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) from the data in the PyArrow `Table`.
    

[](#view-example-results)

View example results

```sh
    co   hum         room  temp                time
0    0  35.9  Living Room  21.1 2022-01-02 11:46:40
1    0  35.9      Kitchen  21.0 2022-01-02 11:46:40
2    0  36.2      Kitchen  23.0 2022-01-02 12:46:40
3    0  35.9  Living Room  21.4 2022-01-02 12:46:40
4    0  36.1      Kitchen  22.7 2022-01-02 13:46:40
5    0  36.0  Living Room  21.8 2022-01-02 13:46:40
6    0  36.0      Kitchen  22.4 2022-01-02 14:46:40
7    0  36.0  Living Room  22.2 2022-01-02 14:46:40
8    0  36.0      Kitchen  22.5 2022-01-02 15:46:40
9    0  35.9  Living Room  22.2 2022-01-02 15:46:40
10   1  36.5      Kitchen  22.8 2022-01-02 16:46:40
11   0  36.0  Living Room  22.4 2022-01-02 16:46:40
12   1  36.3      Kitchen  22.8 2022-01-02 17:46:40
13   0  36.1  Living Room  22.3 2022-01-02 17:46:40
14   3  36.2      Kitchen  22.7 2022-01-02 18:46:40
15   1  36.1  Living Room  22.3 2022-01-02 18:46:40
16   7  36.0      Kitchen  22.4 2022-01-02 19:46:40
17   4  36.0  Living Room  22.4 2022-01-02 19:46:40
18   9  36.0      Kitchen  22.7 2022-01-02 20:46:40
19   5  35.9  Living Room  22.6 2022-01-02 20:46:40
20  18  36.9      Kitchen  23.3 2022-01-02 21:46:40
21   9  36.2  Living Room  22.8 2022-01-02 21:46:40
22  22  36.6      Kitchen  23.1 2022-01-02 22:46:40
23  14  36.3  Living Room  22.5 2022-01-02 22:46:40
24  26  36.5      Kitchen  22.7 2022-01-02 23:46:40
25  17  36.4  Living Room  22.2 2022-01-02 23:46:40
```

Next, [use pandas to analyze data](#use-pandas-to-analyze-data).

## Use pandas to analyze data

-   [View data information and statistics](#view-data-information-and-statistics)
-   [Downsample time series](#downsample-time-series)

### View data information and statistics

The following example shows how to use pandas `DataFrame` methods to transform and summarize data stored in InfluxDB Clustered.

```py
# pandas-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
  "https://cluster-host.com",
  database="DATABASE_NAME",
  token="DATABASE_TOKEN")

# Execute the query to retrieve all record batches in the stream
# formatted as a PyArrow Table.
table = client.query(
  '''SELECT *
    FROM home
    WHERE time >= now() - INTERVAL '90 days'
    ORDER BY time'''
)

client.close()

# Convert the PyArrow Table to a pandas DataFrame.
dataframe = table.to_pandas()

# Print information about the results DataFrame,
# including the index dtype and columns, non-null values, and memory usage.
dataframe.info()

# Calculate descriptive statistics that summarize the distribution of the results.
print(dataframe.describe())

# Extract a DataFrame column.
print(dataframe['temp'])

# Print the DataFrame in Markdown format.
print(dataframe.to_markdown())
```

Replace the following configuration values:

-   `DATABASE_NAME`: the name of the InfluxDB [database](/influxdb3/clustered/admin/databases/) to query
-   `DATABASE_TOKEN`: a [database token](/influxdb3/clustered/admin/tokens/#database-tokens) with read permission on the specified database

### Downsample time series

The pandas library provides extensive features for working with time series data.

The [`pandas.DataFrame.resample()` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) downsamples and upsamples data to time-based groups–for example:

```py
# pandas-example.py

...

# Use the `time` column to generate a DatetimeIndex for the DataFrame
dataframe = dataframe.set_index('time')

# Print information about the index
print(dataframe.index)

# Downsample data into 1-hour groups based on the DatetimeIndex
resample = dataframe.resample("1H")

# Print a summary that shows the start time and average temp for each group
print(resample['temp'].mean())
```

[](#view-example-results)

View example results

```sh
time
2023-07-16 22:00:00          NaN
2023-07-16 23:00:00    22.600000
2023-07-17 00:00:00    22.513889
2023-07-17 01:00:00    22.208333
2023-07-17 02:00:00    22.300000
...
Freq: H, Name: temp, Length: 469323, dtype: float64
```

For more detail and examples, see the [pandas documentation](https://pandas.pydata.org/docs/index.html).

#### Related

-   [Use Python to query data](/influxdb3/clustered/query-data/execute-queries/client-libraries/python/)

[analysis](/influxdb3/clustered/tags/analysis/) [pandas](/influxdb3/clustered/tags/pandas/) [pyarrow](/influxdb3/clustered/tags/pyarrow/) [python](/influxdb3/clustered/tags/python/)
