{"id":6397,"library":"maggma","title":"Maggma Data Pipeline Framework","description":"Maggma is a framework to build scientific data processing pipelines, handling data from diverse sources like databases, Azure Blobs, and local files, up to REST APIs. It provides core abstractions, `Store` and `Builder`, for modular ETL-like operations. The `Store` interface often mimics PyMongo syntax, enabling consistent data access across different backends. Actively developed by the Materials Project, it is currently at version 0.72.1 and requires Python 3.9+.","status":"active","version":"0.72.1","language":"en","source_language":"en","source_url":"https://github.com/materialsproject/maggma","tags":["data pipeline","database","mongodb","etl","materials science","api","data processing"],"install":[{"cmd":"pip install maggma","lang":"bash","label":"Install latest stable version"}],"dependencies":[{"reason":"Data validation and settings management.","package":"pydantic","optional":false},{"reason":"Primary MongoDB interaction, often used as a backend for 'Store' classes.","package":"pymongo","optional":false},{"reason":"Utility functions for materials science, a common dependency in the Materials Project ecosystem.","package":"monty","optional":false},{"reason":"Data manipulation and analysis, used in some 'Store' and 'Builder' implementations.","package":"pandas","optional":false},{"reason":"Numerical operations, a fundamental data science library.","package":"numpy","optional":false},{"reason":"AWS SDK for Python, enabling S3 and Azure Blob 'Store' functionality.","package":"boto3","optional":true},{"reason":"SSH tunneling capabilities, often used for secure database connections.","package":"sshtunnel","optional":true}],"imports":[{"symbol":"MemoryStore","correct":"from maggma.stores import MemoryStore"},{"symbol":"MongoStore","correct":"from maggma.stores import MongoStore"},{"symbol":"Builder","correct":"from maggma.builders import Builder"},{"symbol":"Store","correct":"from maggma.core import Store"}],"quickstart":{"code":"import os\nfrom maggma.stores import MemoryStore\n\n# Sample data\nturtles = [\n    {\"name\": \"Leonardo\", \"color\": \"blue\", \"tool\": \"sword\"},\n    {\"name\": \"Donatello\", \"color\": \"purple\", \"tool\": \"staff\"},\n    {\"name\": \"Michelangelo\", \"color\": \"orange\", \"tool\": \"nunchuks\"},\n    {\"name\": \"Raphael\", \"color\": \"red\", \"tool\": \"sai\"}\n]\n\n# Create a MemoryStore (in-memory, data not persistent)\n# 'key' argument specifies the unique identifier for documents\nstore = MemoryStore(key=\"name\")\n\n# Connect to the store (for MemoryStore, this just initializes it)\nstore.connect()\n\n# Add data to the store using update\n# upsert=True means insert if not found, update if found\nstore.update(turtles, key_field='name', upsert=True)\n\n# Query the store\nprint(f\"Total documents: {store.count()}\")\nprint(f\"Blue turtle: {store.query(criteria={'color': 'blue'}).current()}\")\n\n# Find distinct values\nprint(f\"Distinct colors: {list(store.distinct(field='color'))}\")\n\n# Close the store connection (important for persistent stores)\nstore.close()\n\n# Example of using a persistent store (e.g., MongoStore)\n# Requires a MongoDB instance running and pymongo installed.\n# uri = os.environ.get('MONGO_URI', 'mongodb://localhost:27017/test_db')\n# from maggma.stores import MongoStore\n# mongo_store = MongoStore(collection_name='my_collection', database_name='test_db', host=uri, key='name')\n# try:\n#     mongo_store.connect()\n#     mongo_store.update(turtles, key_field='name', upsert=True)\n#     print(f\"MongoStore count: {mongo_store.count()}\")\n# finally:\n#     mongo_store.close()\n","lang":"python","description":"This quickstart demonstrates the core concepts of Maggma: defining data as a list of dictionaries, creating a `Store` (using `MemoryStore` for simplicity), connecting to it, adding data using the `update` method, and querying data. It highlights the use of a `key` field for unique document identification. A commented-out example for `MongoStore` is included to illustrate persistent storage."},"warnings":[{"fix":"Review the Changelog and documentation for `v0.72.0` for migration details. Projects should update their API implementations to align with the new recommended patterns.","message":"The `maggma.api` module has been deprecated and will be migrated. This could significantly impact projects relying on Maggma's built-in API functionalities.","severity":"breaking","affected_versions":"v0.72.0 and later"},{"fix":"Consult the specific `Store` class documentation for its supported query features. Stick to basic `query`, `count`, `distinct` operations for maximum compatibility across different `Store` types. For advanced queries, consider processing data after retrieval or using a `MongoStore`.","message":"Maggma's `Store` classes provide a unified interface that resembles PyMongo. However, not all `Store` implementations (e.g., FileStore, S3Store) support the full breadth of PyMongo's query capabilities or advanced features like aggregation pipelines. Over-reliance on PyMongo-specific syntax with non-Mongo backends can lead to unexpected behavior or unsupported operations.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For persistent storage, use a dedicated `Store` implementation like `MongoStore`, `FileStore`, `GridFSStore`, or `S3Store`. Ensure proper connection and disconnection for persistent stores.","message":"Using `MemoryStore` is suitable for testing and quick examples, but it is not persistent. Any data added to a `MemoryStore` will be lost when the Python interpreter closes or the `Store` object is garbage collected.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always ensure your data has a robust, unique identifier for the `key` field. When performing `update` operations, be mindful of the `key_field` and `upsert` parameters to avoid unintended data overwrites or errors.","message":"Documents added to a `Store` must have a unique identifier, specified by the `key` argument during `Store` initialization (defaulting to `task_id`). If duplicates are inserted with the same key and `upsert=True`, the old document will be overwritten. If `upsert=False`, it may lead to errors depending on the store implementation.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Pin your `numpy` version to `<2.0` (e.g., `numpy<2.0`) in your project's dependencies until official `maggma` compatibility with `numpy` 2.0 is confirmed and released.","message":"Maggma, particularly components like `OpenDataStore`, has reported compatibility issues with `numpy` version 2.0. This can lead to unexpected errors or broken functionality.","severity":"breaking","affected_versions":"Reported with `numpy` 2.0 (maggma v0.72.1)."}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z"}