Skip to content

Pandas Integration

Use pandas to analyze Komodo data with familiar DataFrame operations.

Terminal window
uv add pandas

Pandas can run SQL and build a DataFrame in one step:

import pandas as pd
from komodo import get_snowflake_connection
conn = get_snowflake_connection()
df = pd.read_sql(
"SELECT column_name, table_name FROM INFORMATION_SCHEMA.COLUMNS LIMIT 500",
conn,
)
print(df.head())
print(df["TABLE_NAME"].nunique(), "distinct tables in sample")
conn.close()

Use params with read_sql when you bind values (driver-specific; positional or named depending on your query style).

Cursor helpers: fetch_pandas_all and fetch_pandas_batches

Section titled “Cursor helpers: fetch_pandas_all and fetch_pandas_batches”

The Snowflake-compatible cursor can return DataFrames directly — useful when you already have a cursor and want Arrow/pandas-optimized paths:

from komodo import get_snowflake_connection
conn = get_snowflake_connection()
cursor = conn.cursor()
cursor.execute("USE DATABASE DATA")
cursor.execute("SELECT column_name, table_name FROM INFORMATION_SCHEMA.COLUMNS")
df = cursor.fetch_pandas_all()
print(df.head(20), len(df))

For large result sets, stream batches instead of loading everything into memory:

cursor.execute(
"SELECT column_name, table_name FROM INFORMATION_SCHEMA.COLUMNS"
)
total = 0
for batch_df in cursor.fetch_pandas_batches():
total += len(batch_df)
print("batch rows:", len(batch_df))
print("total rows:", total)

See the Snowflake Python connector API for details on these cursor methods.

Once you have a DataFrame, use normal pandas workflows — group, filter, describe:

df = pd.read_sql(
"SELECT table_name, column_name FROM INFORMATION_SCHEMA.COLUMNS "
"WHERE table_schema = 'PUBLIC' LIMIT 10000",
conn,
)
by_table = df.groupby("TABLE_NAME").size().sort_values(ascending=False)
print(by_table.head(10))

Pandas may warn that it prefers SQLAlchemy connectables. The Komodo connection is still DB-API 2.0 compliant; you can suppress that warning if it is noisy:

import warnings
warnings.filterwarnings(
"ignore",
message="pandas only supports SQLAlchemy connectable",
category=UserWarning,
)
df.to_csv("results.csv", index=False)
df.to_parquet("results.parquet", index=False)