LMDB Python Binding
lmdb (py-lmdb) is a universal Python binding for the LMDB 'Lightning' Database, a fast, memory-efficient, embedded key-value store. It provides an ordered map interface, multi-version concurrency control (MVCC) with reader/writer transactions, and utilizes memory-mapped files for zero-copy operations. The library is actively maintained, with frequent releases; the current version is 2.2.0.
Warnings
- gotcha Keys and values in LMDB must be byte strings. Python strings (unicode) need to be explicitly encoded (e.g., `s.encode('utf-8')`) before being stored, and decoded (`b.decode('utf-8')`) after retrieval.
- gotcha The `map_size` parameter in `lmdb.open()` sets the maximum size of the database. It is crucial to set this value sufficiently large upfront, especially if the database is opened by multiple processes. Modifying `map_size` from multiple processes concurrently can lead to catastrophic data loss.
- gotcha When using named databases (sub-databases), the `max_dbs` parameter must be set in `lmdb.open()` during the *first* opening of the environment by any process or thread. This allocates shared memory resources for the maximum number of named databases.
- gotcha Write transactions (`env.begin(write=True)`) must be committed using `txn.commit()`. If not explicitly committed (e.g., if an exception occurs within the `with` block without proper handling), changes will be rolled back by default.
- breaking Version 1.4.1 and newer of `py-lmdb` dropped support for Python 2.x and older Python 3.x versions. Specifically, Python 3.9 or newer is now required.
Install
-
pip install lmdb
Imports
- lmdb
import lmdb
Quickstart
import lmdb
import os
import shutil
db_path = './my_lmdb_data'
# Ensure cleanup for demonstration
if os.path.exists(db_path):
shutil.rmtree(db_path)
# 1. Open an LMDB environment
# map_size: Maximum size of the database. Crucial to set correctly.
# max_dbs: Max number of named databases (sub-databases).
env = lmdb.open(db_path, map_size=10*1024*1024, max_dbs=10, subdir=True)
# 2. Write data to the database
with env.begin(write=True) as txn:
txn.put(b'my_key_1', b'my_value_1')
txn.put(b'my_key_2', b'my_value_2')
txn.put(b'another_key', b'another_value')
print("Wrote data to LMDB.")
# 3. Read data from the database
with env.begin() as txn:
value1 = txn.get(b'my_key_1')
value_non_existent = txn.get(b'non_existent_key')
print(f"Value for my_key_1: {value1.decode('utf-8') if value1 else None}")
print(f"Value for non_existent_key: {value_non_existent}")
# Iterate through all key-value pairs
print("\nAll items in DB:")
for key, value in txn.cursor():
print(f" {key.decode('utf-8')}: {value.decode('utf-8')}")
# 4. Close the environment
env.close()
print("\nLMDB environment closed.")
# Cleanup (optional, for examples)
if os.path.exists(db_path):
shutil.rmtree(db_path)
print(f"Cleaned up database at {db_path}")