VerticaPy

raw JSON →
1.1.1 verified Fri May 01 auth: no python

VerticaPy is a Python library for data exploration, data cleaning, and machine learning in Vertica. It simplifies the integration between Python and Vertica databases, providing a pandas-like interface and ML capabilities that run directly in-database. Current version is 1.1.1, released May 2025. The project follows a monthly release cadence.

pip install verticapy
error No module named 'verticapy.core'
cause Importing from the old submodule path that was removed in 1.0.0.
fix
Use from verticapy import vDataFrame instead of from verticapy.core import vDataFrame.
error AttributeError: 'vDataFrame' object has no attribute 'select'
cause The `select` method was renamed to `select_` (with trailing underscore) to avoid Python keyword conflict.
fix
Use vdf.select_('col1', 'col2') instead of vdf.select('col1', 'col2').
breaking Import paths changed significantly in 1.0.0. Many functions moved from submodules to top-level or were renamed.
fix Run `verticapy.upgrade()` or consult the migration guide. For example, `from verticapy import vDataFrame` instead of `from verticapy.core import vDataFrame`.
gotcha The database connection must be passed explicitly when creating a vDataFrame. It is easy to forget and assume an implicit connection.
fix Always provide a cursor or connection object: `vDataFrame('table', cursor)`.
gotcha vDataFrame methods mutate the object in-place by default, unlike pandas which returns a new object.
fix Be aware that operations like `drop()` modify the current vDataFrame; use `.copy()` if you need to preserve the original.

Quickstart: connect to Vertica, load a table into vDataFrame, explore.

from verticapy import vDataFrame, set_option

# Optional: configure display
set_option('max_cellwidth', 50)

# Connect using parameters (replace with actual credentials)
from vertica_python import connect
conn_info = {
    'host': 'localhost',
    'port': 5433,
    'user': 'dbadmin',
    'password': '',
    'database': 'vmart',
    'ssl': False
}
cur = connect(**conn_info).cursor()

# Create vDataFrame from a table
vdf = vDataFrame('public.my_table', cur)

# Quick exploration
print(vdf.shape())
print(vdf.describe())