PyMongo Schema
PyMongo Schema is a Python library designed to analyze MongoDB collections and databases, inferring their underlying schema structure. It helps users understand the document shapes within their MongoDB instances. As of version 0.4.2, it provides tools for generating schema definitions but does not enforce them. The project has a low release cadence, indicating stability but also less frequent updates.
Common errors
-
ModuleNotFoundError: No module named 'pymongo_schema'
cause The 'pymongo-schema' library has not been installed in your Python environment.fixRun `pip install pymongo-schema` to install the library. -
TypeError: __init__ missing 1 required positional argument: 'collection'
cause You are trying to instantiate `pymongo_schema.Schema` without passing a `pymongo.collection.Collection` object.fixPass a valid collection object: `schema = Schema(my_collection)`. -
TypeError: __init__ missing 1 required positional argument: 'db'
cause You are trying to instantiate `pymongo_schema.db.DBSchema` without passing a `pymongo.database.Database` object.fixPass a valid database object: `db_schema = DBSchema(my_database)`. -
pymongo.errors.ConnectionFailure: Cannot connect to database
cause The Python client could not establish a connection to your MongoDB server. This often means MongoDB is not running, is running on a different port/host, or has network restrictions.fixEnsure your MongoDB server is running and accessible from where you're running your Python script. Verify the connection URI, host, and port. If authentication is required, include credentials in your `pymongo.MongoClient` URI or parameters.
Warnings
- gotcha PyMongo Schema infers and describes the schema of your data; it does NOT validate or enforce schema rules at runtime. It's a reporting tool, not a validation engine.
- gotcha Generating a schema for very large collections or databases can be memory-intensive and slow, as it may need to sample or process a significant portion of the documents.
- gotcha There's a distinction between `pymongo_schema.Schema` and `pymongo_schema.db.DBSchema`. `Schema` expects a `pymongo.collection.Collection` object, while `DBSchema` expects a `pymongo.database.Database` object.
- gotcha The library's development activity is low. While stable, it may not immediately support very recent `pymongo` versions or new MongoDB features, potentially leading to compatibility issues in the future.
Install
-
pip install pymongo-schema
Imports
- Schema
from pymongo_schema import Schema
- DBSchema
from pymongo_schema.db import DBSchema
Quickstart
import os
import pymongo
from pymongo_schema import Schema
from pymongo_schema.db import DBSchema
# Ensure MongoDB is running on localhost:27017
# For authentication, use os.environ.get('MONGO_USER') etc.
MONGO_URI = os.environ.get('MONGO_URI', 'mongodb://localhost:27017/')
DB_NAME = 'pymongo_schema_test_db'
COLLECTION_NAME = 'my_test_collection'
try:
client = pymongo.MongoClient(MONGO_URI)
db = client[DB_NAME]
collection = db[COLLECTION_NAME]
# Insert some dummy data for schema inference
collection.insert_many([
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "hobbies": ["reading", "coding"]},
{"name": "Charlie", "age": 35, "city": "London", "is_active": True},
{"name": "David", "country": "Canada", "age": 40}
])
print(f"--- Schema for collection '{COLLECTION_NAME}' ---")
collection_schema = Schema(collection)
schema_result = collection_schema.create_schema()
# print(schema_result) # Uncomment to see full schema
print(f"Keys in collection schema: {list(schema_result.keys())}")
print(f"Name type: {schema_result.get('name', {}).get('type')}")
print(f"\n--- Schema for database '{DB_NAME}' ---")
db_schema = DBSchema(db)
db_schema_result = db_schema.create_schema()
# print(db_schema_result) # Uncomment to see full DB schema
print(f"Collections in DB schema: {list(db_schema_result.keys())}")
except pymongo.errors.ConnectionFailure as e:
print(f"Error: Could not connect to MongoDB at {MONGO_URI}. Please ensure MongoDB is running. Details: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
finally:
# Clean up the test database
if 'client' in locals() and client:
if DB_NAME in client.list_database_names():
client.drop_database(DB_NAME)
print(f"\nCleaned up database '{DB_NAME}'.")
client.close()