RocksDict: Python On-Disk Key-Value Store
RocksDict provides a Python binding for RocksDB, offering an efficient on-disk key-value storage solution. It enables users to store, query, and delete a large number of key-value pairs that may not fit into RAM. The library supports storing various Python objects (with Pickle) in its default mode and raw bytes in its raw mode. It also functions as an interface to inspect RocksDB databases created by other languages. The current version is 0.3.29, with a release cadence that includes frequent updates for Python version support and new features.
Warnings
- breaking The methods `Options.set_ignore_range_deletions` and `Options.set_skip_checking_sst_file_sizes_on_db_open` have been removed. Code using these methods will break.
- gotcha When opening an existing RocksDB database created by other languages (C++, Java, etc.), `rocksdict` automatically defaults to 'Raw Mode' since v0.3.24b2. In Raw Mode, only bytes can be stored or retrieved as keys and values. Attempting to use Python objects will fail if not manually handled.
- gotcha RocksDict currently does not have good support for `merge` operations or custom comparators directly within its Python interface. While RocksDB supports these features, they are not fully exposed or easily usable in `rocksdict`.
- gotcha When working with large datasets, especially if processes crash, you might encounter 'Corruption: Corrupt or unsupported format_version' or 'IO error: lock hold by current process' errors. This indicates data corruption or issues with database locks.
Install
-
pip install rocksdict
Imports
- Rdict
from rocksdict import Rdict
- Options
from rocksdict import Options
Quickstart
import os
from rocksdict import Rdict, Options
path = str("./my_test_db")
# Ensure clean start for example
if os.path.exists(path):
Rdict(path).destroy()
# Create an Rdict with default options
db = Rdict(path)
# Store various Python objects
db[1] = "value_one"
db["key_two"] = 25
db[b"binary_key"] = b"binary_value"
db["list_key"] = [1, 2, 3]
db["dict_key"] = {"a": 1, "b": 2}
print(f"Value for key_two: {db['key_two']}")
# Reopen Rdict from disk after closing
db.close()
db = Rdict(path)
print(f"Value for list_key after reopen: {db['list_key']}")
# Iterate through items
print("\nItems in the database:")
for k, v in db.items():
print(f"{k} -> {v}")
# Delete an item
del db[1]
assert 1 not in db
# Destroy the database (clean up)
db.destroy()