{"id":5772,"library":"icechunk","title":"Icechunk","description":"Icechunk is an open-source (Apache 2.0), transactional storage engine for tensor / ND-array data, designed for use on cloud object storage. It augments the Zarr core data model with features that enhance performance, collaboration, and safety in a multi-user cloud-computing context. The library is currently at version 2.0.1 and follows a versioning scheme where major versions align with the on-disk format, allowing for breaking API changes even in minor releases.","status":"active","version":"2.0.1","language":"en","source_language":"en","source_url":"https://github.com/earth-mover/icechunk","tags":["data storage","zarr","cloud storage","transactions","arrays","tensors","version control","geospatial"],"install":[{"cmd":"pip install icechunk","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Icechunk works with the Zarr V3 Specification and requires Zarr Python 3 for interaction with underlying data.","package":"zarr","optional":false}],"imports":[{"note":"While submodules exist (e.g., `icechunk.repository`), the main classes and storage factories are often exposed directly at the top level for convenience.","wrong":"from icechunk.repository import Repository","symbol":"Repository","correct":"from icechunk import Repository"},{"note":"Storage factory functions are typically available directly from the top-level `icechunk` package.","wrong":"from icechunk.storage import s3_storage","symbol":"s3_storage","correct":"from icechunk import s3_storage"}],"quickstart":{"code":"import icechunk as ic\nimport zarr\nimport numpy as np\nimport tempfile\nimport os\n\n# Create a temporary directory for the local repository\ntemp_dir = tempfile.TemporaryDirectory()\nrepo_path = os.path.join(temp_dir.name, \"my_icechunk_repo\")\n\ntry:\n    # 1. Create a new Icechunk repository on the local filesystem\n    storage = ic.local_filesystem_storage(repo_path)\n    repo = ic.Repository.create(storage)\n    print(f\"Repository created at: {repo_path}\")\n\n    # 2. Create a writable session on the 'main' branch\n    session = repo.writable_session(\"main\")\n\n    # 3. Access the Zarr store from the session\n    store = session.store # A zarr store\n\n    # 4. Use Zarr to create a group and an array\n    root = zarr.group(store=store)\n    data = np.arange(1000).reshape(10, 10, 10)\n    zarr_array = root.create_array(\n        'my_data',\n        shape=data.shape,\n        dtype=data.dtype,\n        chunks=(5, 5, 5)\n    )\n    zarr_array[:] = data\n\n    # 5. Commit the changes\n    snapshot_id = session.commit(\"Initial data commit\")\n    print(f\"First commit successful with snapshot ID: {snapshot_id}\")\n\n    # A new session is required for further writes after a commit\n    session_2 = repo.writable_session(\"main\")\n    store_2 = session_2.store\n    zarr_array_2 = zarr.open_array(store_2, 'my_data', mode='r+')\n    zarr_array_2[:5, :5, :5] = 999 # Overwrite a subset\n    snapshot_id_2 = session_2.commit(\"Overwrite some values\")\n    print(f\"Second commit successful with snapshot ID: {snapshot_id_2}\")\n\n    # 6. Explore version history\n    print(\"\\nRepository history:\")\n    for snapshot in repo.log(\"main\"):\n        print(f\"  ID: {snapshot.id}, Message: {snapshot.commit_message}\")\n\nfinally:\n    # Clean up the temporary directory\n    temp_dir.cleanup()\n    print(f\"\\nCleaned up temporary directory: {temp_dir.name}\")","lang":"python","description":"This quickstart demonstrates how to create a local Icechunk repository, interact with it using Zarr through a writable session, commit changes, and then make further modifications requiring a new session. It also shows how to review the repository's commit history."},"warnings":[{"fix":"Upgrade your Python environment to 3.12 or a newer compatible version.","message":"Icechunk 2.0.0 and later requires Python 3.12 or higher. Support for Python 3.11 was dropped.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Use `ic.upgrade_icechunk_repository(repo, dry_run=False)` to migrate your 1.x repository to the 2.0 format. Ensure no other processes are accessing the repository during migration.","message":"The on-disk storage format changed with Icechunk 2.0.0. Existing repositories created with Icechunk 1.x must be migrated using the `upgrade_icechunk_repository()` function. This is an administrative operation and must be executed in isolation (no other readers/writers).","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Update your code to use the new `snake_case` enum variant names.","message":"Enums like `ChunkType` had their variants renamed from `UPPER_CASE` to `snake_case` (e.g., `ChunkType.INLINE` became `ChunkType.inline`).","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Always obtain a new `repo.writable_session()` instance for each set of modifications you intend to commit.","message":"After a `writable_session.commit()` is successfully executed, that session becomes read-only. To make further changes and commit them, you must create a new `writable_session`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure that repository creation is a singular, isolated operation. Once created, repositories can be opened concurrently.","message":"Concurrent creation of an Icechunk repository in the same location from multiple processes is not safe.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always review the changelog or release notes thoroughly before upgrading to any new minor version of Icechunk to understand potential breaking changes.","message":"Icechunk's version policy allows breaking API changes to occur in minor releases (e.g., `2.0.0` to `2.1.0`), not just major versions, to align library versions with the on-disk format.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}