Documentation

Use pandas to analyze and visualize data

Use pandas, the Python data analysis library, to process, analyze, and visualize data stored in an InfluxDB Cloud Serverless bucket.

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Install prerequisites

The examples in this guide assume using a Python virtual environment and the InfluxDB v3 influxdb3-python Python client library. For more information, see how to get started using Python to query InfluxDB.

Installing influxdb3-python also installs the pyarrow library that provides Python bindings for Apache Arrow.

Install pandas

To use pandas, you need to install and import the pandas library.

In your terminal, use pip to install pandas in your active Python virtual environment:

pip install pandas

Use PyArrow to convert query results to pandas

The following steps use Python, influxdb3-python, and pyarrow to query InfluxDB and stream Arrow data to a pandas DataFrame.

  1. In your editor, copy and paste the following code to a new file–for example, pandas-example.py:

    # pandas-example.py
    
    from influxdb_client_3 import InfluxDBClient3
    import pandas
    
    # Instantiate an InfluxDB client configured for a bucket
    client = InfluxDBClient3(
      "https://cloud2.influxdata.com",
      database="
    BUCKET_NAME
    "
    ,
    token="
    API_TOKEN
    "
    )
    # Execute the query to retrieve all record batches in the stream # formatted as a PyArrow Table. table = client.query( '''SELECT * FROM home WHERE time >= now() - INTERVAL '90 days' ORDER BY time''' ) client.close() # Convert the PyArrow Table to a pandas DataFrame. dataframe = table.to_pandas() print(dataframe)
  2. Replace the following configuration values:

    • BUCKET_NAME: the name of the InfluxDB bucket to query
    • API_TOKEN: an InfluxDB token with read permission on the specified bucket
  3. In your terminal, use the Python interpreter to run the file:

    python pandas-example.py
    

The example calls the following methods:

View example results

Next, use pandas to analyze data.

Use pandas to analyze data

View data information and statistics

The following example shows how to use pandas DataFrame methods to transform and summarize data stored in InfluxDB Cloud Serverless.

# pandas-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

# Instantiate an InfluxDB client configured for a bucket
client = InfluxDBClient3(
  "https://cloud2.influxdata.com",
  database="
BUCKET_NAME
"
,
token="
API_TOKEN
"
)
# Execute the query to retrieve all record batches in the stream # formatted as a PyArrow Table. table = client.query( '''SELECT * FROM home WHERE time >= now() - INTERVAL '90 days' ORDER BY time''' ) client.close() # Convert the PyArrow Table to a pandas DataFrame. dataframe = table.to_pandas() # Print information about the results DataFrame, # including the index dtype and columns, non-null values, and memory usage. dataframe.info() # Calculate descriptive statistics that summarize the distribution of the results. print(dataframe.describe()) # Extract a DataFrame column. print(dataframe['temp']) # Print the DataFrame in Markdown format. print(dataframe.to_markdown())

Replace the following configuration values:

  • BUCKET_NAME: the name of the InfluxDB bucket to query
  • API_TOKEN: An InfluxDB token with read permission on the specified bucket.

Downsample time series

The pandas library provides extensive features for working with time series data.

The pandas.DataFrame.resample() method downsamples and upsamples data to time-based groups–for example:

# pandas-example.py

...

# Use the `time` column to generate a DatetimeIndex for the DataFrame
dataframe = dataframe.set_index('time')

# Print information about the index
print(dataframe.index)

# Downsample data into 1-hour groups based on the DatetimeIndex
resample = dataframe.resample("1H")

# Print a summary that shows the start time and average temp for each group
print(resample['temp'].mean())

View example results

For more detail and examples, see the pandas documentation.


Was this page helpful?

Thank you for your feedback!


Introducing InfluxDB Clustered

A highly available InfluxDB 3.0 cluster on your own infrastructure.

InfluxDB Clustered is a highly available InfluxDB 3.0 cluster built for high write and query workloads on your own infrastructure.

InfluxDB Clustered is currently in limited availability and is only available to a limited group of InfluxData customers. If interested in being part of the limited access group, please contact the InfluxData Sales team.

Learn more
Contact InfluxData Sales

The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Flux is going into maintenance mode and will not be supported in InfluxDB 3.0. This was a decision based on the broad demand for SQL and the continued growth and adoption of InfluxQL. We are continuing to support Flux for users in 1.x and 2.x so you can continue using it with no changes to your code. If you are interested in transitioning to InfluxDB 3.0 and want to future-proof your code, we suggest using InfluxQL.

For information about the future of Flux, see the following:

InfluxDB Cloud Serverless