Pandas Integration

Use pandas to analyze Komodo data with familiar DataFrame operations.

Install pandas

uv add pandas

pip install pandas

`read_sql`

Pandas can run SQL and build a DataFrame in one step:

import pandas as pd
from komodo import get_snowflake_connection

conn = get_snowflake_connection()
df = pd.read_sql(
    "SELECT column_name, table_name FROM INFORMATION_SCHEMA.COLUMNS LIMIT 500",
    conn,
)
print(df.head())
print(df["TABLE_NAME"].nunique(), "distinct tables in sample")
conn.close()

Use params with read_sql when you bind values (driver-specific; positional or named depending on your query style).

Cursor helpers: `fetch_pandas_all` and `fetch_pandas_batches`

The Snowflake-compatible cursor can return DataFrames directly — useful when you already have a cursor and want Arrow/pandas-optimized paths:

from komodo import get_snowflake_connection

conn = get_snowflake_connection()
cursor = conn.cursor()
cursor.execute("USE DATABASE DATA")
cursor.execute("SELECT column_name, table_name FROM INFORMATION_SCHEMA.COLUMNS")

df = cursor.fetch_pandas_all()
print(df.head(20), len(df))

For large result sets, stream batches instead of loading everything into memory:

cursor.execute(
    "SELECT column_name, table_name FROM INFORMATION_SCHEMA.COLUMNS"
)
total = 0
for batch_df in cursor.fetch_pandas_batches():
    total += len(batch_df)
    print("batch rows:", len(batch_df))
print("total rows:", total)

See the Snowflake Python connector API for details on these cursor methods.

Simple analysis

Once you have a DataFrame, use normal pandas workflows — group, filter, describe:

df = pd.read_sql(
    "SELECT table_name, column_name FROM INFORMATION_SCHEMA.COLUMNS "
    "WHERE table_schema = 'PUBLIC' LIMIT 10000",
    conn,
)
by_table = df.groupby("TABLE_NAME").size().sort_values(ascending=False)
print(by_table.head(10))

Warnings about non-SQLAlchemy connections

Pandas may warn that it prefers SQLAlchemy connectables. The Komodo connection is still DB-API 2.0 compliant; you can suppress that warning if it is noisy:

import warnings

warnings.filterwarnings(
    "ignore",
    message="pandas only supports SQLAlchemy connectable",
    category=UserWarning,
)

Exporting results

df.to_csv("results.csv", index=False)
df.to_parquet("results.parquet", index=False)

Executing Queries — synchronous SQL and CLI
Async Queries — polling long-running queries before fetching into pandas