Nano VectorDB
Nano VectorDB is a simple, easy-to-hack, in-memory, disk-persisted vector database implementation. It's designed for rapid prototyping, educational purposes, and small-scale applications, offering a lightweight alternative to more complex solutions. The current version is 0.0.4.3, with an active development cadence featuring frequent minor releases.
Common errors
-
TypeError: 'list' object cannot be interpreted as an array
cause Input vectors passed to `add` or `search` methods are Python lists instead of NumPy arrays.fixConvert your list of floats into a NumPy array before passing it to NanoVectorDB. Example: `np.array([1.0, 2.0, 3.0])`. -
ValueError: Vector dimension mismatch. Expected {expected_dim}, got {actual_dim}.cause The dimension of the vector being added or queried does not match the `dim` specified during `NanoVectorDB` initialization.fixEnsure all vectors added to the database and all query vectors have the exact same dimension (`dim`) as defined when the `NanoVectorDB` instance was created. If loading from disk, the `dim` parameter must match the original saved database. -
FileNotFoundError: [Errno 2] No such file or directory: '{db_path}/index.npy'cause Attempting to load a database that has not been saved yet, or the specified `db_path` is incorrect, or the directory was deleted/moved.fixEnsure `db.save()` was called previously. Verify that the `db_path` provided to `NanoVectorDB` when loading is identical to the path used when saving, and that the directory and its contents still exist.
Warnings
- breaking As a pre-1.0 library (currently 0.0.x), NanoVectorDB's API is not stable. Method signatures, class names, or data structures returned by functions like `search` may change in minor or patch releases without explicit 'breaking change' warnings, requiring code adjustments.
- gotcha NanoVectorDB expects pre-computed embeddings as NumPy arrays. It does not provide functionality to generate embeddings from text or other data types itself. Users must use an external embedding model (e.g., from Hugging Face, OpenAI) to convert their data into vectors before adding them to the database.
- gotcha NanoVectorDB is designed for lightweight, in-memory, or small-scale disk-persisted use cases. It is not built for large-scale, distributed, or high-throughput production environments and lacks features like sharding, replication, or advanced indexing strategies found in enterprise-grade vector databases.
Install
-
pip install nano-vectordb
Imports
- NanoVectorDB
from nano_vectordb.core import NanoVectorDB
from nano_vectordb import NanoVectorDB
Quickstart
import numpy as np
from nano_vectordb import NanoVectorDB
import os
import shutil
# Initialize the database
db_path = "my_nano_vectordb"
db = NanoVectorDB(db_path, dim=4)
# Add vectors with metadata and IDs
db.add(np.array([1.0, 2.0, 3.0, 4.0]), {"text": "The quick brown fox."}, "doc1")
db.add(np.array([1.1, 2.1, 3.1, 4.1]), {"text": "Jumps over the lazy dog."}, "doc2")
db.add(np.array([0.9, 1.9, 2.9, 3.9]), {"text": "Another relevant document."}, "doc3")
# Perform a similarity search
query_vector = np.array([1.0, 2.0, 3.0, 4.0])
k_results = 2
results = db.search(query_vector, k=k_results)
print(f"\nSearch Results for top {k_results} documents:")
for vector, metadata, vector_id, score in results:
print(f" ID: {vector_id}, Metadata: {metadata}, Score: {score:.4f}")
# Save the database to disk
db.save()
print(f"\nDatabase saved to '{db_path}'")
# Load the database from disk
loaded_db = NanoVectorDB(db_path, dim=4) # Re-initialize with path and dim
loaded_db.load()
print(f"Database loaded from '{db_path}'. Number of items: {len(loaded_db.store)}")
# Clean up database files (optional)
if os.path.exists(db_path):
shutil.rmtree(db_path)
print(f"Cleaned up database directory: {db_path}")