Documentation

Use pandas to analyze and visualize data

Use pandas, the Python data analysis library, to process, analyze, and visualize data stored in InfluxDB.

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Install prerequisites

The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python. Installing flightsql-dbapi also installs the pyarrow library that provides Python bindings for Apache Arrow.

For more information, see how to get started querying InfluxDB with Python and flightsql-dbapi

Install pandas

To use pandas, you need to install and import the pandas library.

In your terminal, use pip to install pandas in your active Python virtual environment:

pip install pandas

Use PyArrow to convert query results to pandas

The following steps use Python, flightsql-dbapi, and pyarrow to query InfluxDB and stream Arrow data to a pandas DataFrame.

  1. In your editor, copy and paste the following code to a new file–for example, pandas-example.py:

    # pandas-example.py
    
    from flightsql import FlightSQLClient
    import pandas
    
    client = FlightSQLClient(host='cluster-id.influxdb.io',
                            token='INFLUX_READ_WRITE_TOKEN',
                            metadata={'database': 'INFLUX_DATABASE'},
                            features={'metadata-reflection': 'true'})
    
    info = client.execute("SELECT * FROM home")
    
    reader = client.do_get(info.endpoints[0].ticket)
    
    # Read all record batches in the stream to a pandas DataFrame
    dataframe = reader.read_pandas()
    
    dataframe.info()
    
  2. Replace the following configuration values:

    • INFLUX_READ_WRITE_TOKEN: Your InfluxDB token with read permissions on the databases you want to query.
    • INFLUX_DATABASE: The name of your InfluxDB database.
  3. In your terminal, use the Python interpreter to run the file:

    python pandas-example.py
    

The pyarrow.flight.FlightStreamReader read_pandas() method:

Next, use pandas to analyze data.

Use pandas to analyze data

View data information and statistics

The following example uses the DataFrame info() and describe() methods to print information about the DataFrame.

# pandas-example.py

from flightsql import FlightSQLClient
import pandas

client = FlightSQLClient(host='cluster-id.influxdb.io',
                        token='INFLUX_READ_WRITE_TOKEN',
                        metadata={'database': 'INFLUX_DATABASE'},
                        features={'metadata-reflection': 'true'})

info = client.execute("SELECT * FROM home")

reader = client.do_get(info.endpoints[0].ticket)

dataframe = reader.read_pandas()

# Print a summary of the DataFrame to stdout
dataframe.info()

# Calculate summary statistics for the data
print(dataframe.describe())

Downsample time series

The pandas library provides extensive features for working with time series data.

The pandas.DataFrame.resample() method downsamples and upsamples data to time-based groups–for example:

from flightsql import FlightSQLClient
import pandas

client = FlightSQLClient(host='cluster-id.influxdb.io',
                        token='INFLUX_READ_WRITE_TOKEN',
                        metadata={'database': 'INFLUX_DATABASE'},
                        features={'metadata-reflection': 'true'})

info = client.execute("SELECT * FROM home")

reader = client.do_get(info.endpoints[0].ticket)

dataframe = reader.read_pandas()

# Use the `time` column to generate a DatetimeIndex for the DataFrame
dataframe = dataframe.set_index('time')

# Print information about the index
print(dataframe.index)

# Downsample data into 1-hour groups based on the DatetimeIndex
resample = dataframe.resample("1H")

# Print a summary that shows the start time and average temp for each group
print(resample['temp'].mean())

View example results

For more detail and examples, see the pandas documentation.


Was this page helpful?

Thank you for your feedback!


Introducing InfluxDB 3.0

The new core of InfluxDB built with Rust and Apache Arrow. Available today in InfluxDB Cloud Dedicated.

Learn more

State of the InfluxDB Cloud Serverless documentation

The new documentation for InfluxDB Cloud Serverless is a work in progress. We are adding new information and content almost daily. Thank you for your patience!

If there is specific information you’re looking for, please submit a documentation issue.