Python Dataset Library (SQL Toolkit)
The Python 'dataset' library (version 1.6.2) is a lightweight toolkit for simplified Python-based database access, abstracting away much of the direct SQL interaction. It enables reading and writing data in SQL data stores with an API designed to feel as straightforward as working with JSON files, offering features like implicit table and column creation, upserts, and convenient query helpers. Built on SQLAlchemy, it ensures compatibility with major databases such as SQLite, PostgreSQL, and MySQL. The library maintains a steady release cadence with bug fixes and feature enhancements.
Warnings
- breaking Version 1.7.0 (released March 28, 2026, on GitHub, though not yet on PyPI at time of verification) introduces significant breaking changes. It requires Python 3.9+ and full support for SQLAlchemy 2.0+ (with backward compatibility to 1.4.0). The build system migrated to Hatchling, linting to Ruff, and testing to pytest. Users should review the Changelog for a full list of changes and potential migration steps.
- breaking As of `dataset` version 1.0, the data export features (e.g., freezing data to CSV or JSON) were extracted into a separate, standalone package named `datafreeze`. Projects relying on these export capabilities will need to install and use `datafreeze` in addition to `dataset`.
- gotcha Database-specific drivers (e.g., `psycopg2` for PostgreSQL, `mysqlclient` for MySQL) are NOT automatically installed with the `dataset` package. You must install the appropriate driver separately for the database backend you intend to use. SQLite is built into Python and does not require an additional driver.
- gotcha For configuring database connections, `dataset.connect()` can automatically use a database URL defined in the `DATABASE_URL` environment variable if no URL is explicitly passed. While convenient, ensure sensitive credentials in this environment variable are managed securely, especially in production environments.
Install
-
pip install dataset -
pip install "dataset[postgresql]" # or "dataset[mysql]"
Imports
- dataset
import dataset
Quickstart
import dataset
import os
# Connect to an in-memory SQLite database
db = dataset.connect('sqlite:///:memory:')
# Get a table reference; it will be created if it doesn't exist
table = db['users']
# Insert new records; columns are created automatically
table.insert(dict(name='John Doe', age=30, city='New York'))
table.insert(dict(name='Jane Smith', age=25, city='London'))
# Update an existing record
table.update(dict(name='John Doe', age=31), ['name'])
# Find all records
all_users = table.all()
print("All users:")
for user in all_users:
print(user)
# Find one record by a specific field
john = table.find_one(name='John Doe')
print(f"\nJohn Doe's updated age: {john['age']}")
# Find records with a filter
london_users = table.find(city='London')
print("\nUsers in London:")
for user in london_users:
print(user)
# Using environment variable for database URL (example)
# os.environ['DATABASE_URL'] = 'sqlite:///mydb.db'
# db_env = dataset.connect()
# print(f"\nConnected via env var to: {db_env.url}")