{"id":8586,"library":"rechunker","title":"Rechunker","description":"Rechunker is a Python package designed for efficient and scalable manipulation of the chunk structure of chunked array formats, such as Zarr and TileDB. It takes an input array (or group of arrays) from persistent storage and writes out a new array with the same data but a different chunking scheme, often utilizing an intermediate temporary store. It is currently at version 0.5.4 and is actively maintained by the Pangeo community, with regular releases addressing compatibility and bug fixes.","status":"active","version":"0.5.4","language":"en","source_language":"en","source_url":"https://github.com/pangeo-data/rechunker","tags":["data science","arrays","dask","zarr","chunking","parallel processing","cloud storage"],"install":[{"cmd":"pip install rechunker","lang":"bash","label":"Latest PyPI release"}],"dependencies":[{"reason":"Rechunker is designed to be used within a parallel execution framework such as Dask.","package":"dask","optional":false},{"reason":"Common storage format for chunked arrays that Rechunker manipulates.","package":"zarr","optional":false},{"reason":"Often used for creating and manipulating labeled multi-dimensional arrays, which can then be rechunked by Rechunker. Not a direct dependency but common in workflows.","package":"xarray","optional":true}],"imports":[{"symbol":"rechunk","correct":"from rechunker import rechunk"}],"quickstart":{"code":"import zarr\nfrom rechunker import rechunk\nimport os\n\n# Create a source Zarr array\nsource_store = 'source.zarr'\nif not os.path.exists(source_store):\n    zarr.ones((10, 10, 10), chunks=(2, 2, 2), store=source_store, overwrite=True)\nsource = zarr.open(source_store, mode='r')\n\n# Define target and intermediate stores\nintermediate_store = 'intermediate.zarr'\ntarget_store = 'target.zarr'\n\n# Define the target chunking scheme (e.g., contiguous in the first dimension)\ntarget_chunks = (10, 5, 5)\n\n# Define maximum memory for each worker (e.g., 256MB)\nmax_mem = '256MB'\n\n# Create the rechunking plan\nrechunked_plan = rechunk(\n    source, \n    target_chunks, \n    max_mem, \n    target_store, \n    intermediate_store\n)\n\n# Execute the plan\nresult = rechunked_plan.execute()\n\nprint(f\"Source array chunks: {source.chunks}\")\nprint(f\"Target array chunks: {result.chunks}\")\n\n# Clean up example files\nimport shutil\nshutil.rmtree(source_store)\nshutil.rmtree(intermediate_store)\nshutil.rmtree(target_store)","lang":"python","description":"This quickstart demonstrates how to rechunk a Zarr array from an initial chunking scheme to a new one. It defines source, intermediate, and target Zarr stores, specifies the target chunk size and maximum memory per worker, creates a rechunking plan, and then executes it."},"warnings":[{"fix":"Always use the latest version of `rechunker` to ensure compatibility with recent `xarray` and `dask` versions, or refer to release notes for specific version requirements.","message":"Breaking changes for `xarray` and `dask` compatibility have occurred in some minor versions. For example, `v0.5.4` includes a fix for `xarray>=2025.03.1` and `v0.5.3` for `dask>=2024.12.0` and `xarray>=2024.10.0`.","severity":"breaking","affected_versions":"<0.5.4"},{"fix":"Understand that `rechunker` is designed for disk-to-disk rechunking and explicitly manages intermediate storage to avoid out-of-memory errors on large datasets that Dask's in-memory `rechunk` might encounter. Allocate sufficient `max_mem` and `temp_store`.","message":"The `rechunk` function in Dask can run out of memory for 'full rechunk' operations where every source chunk maps to every target chunk. Rechunker specifically addresses this by leveraging persistent intermediate storage, but users often confuse this with Dask's in-memory `rechunk`.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure input arrays have uniform chunk sizes before passing them to `rechunker`. If using Dask arrays with non-uniform chunks, consider explicitly rechunking them to a uniform size using `dask.array.rechunk` beforehand, or manually ensure uniformity.","message":"Rechunker currently assumes uniform chunks for input arrays (except for the last chunk). This can cause issues with Dask arrays that have been filtered or concatenated Zarr arrays, which may result in non-uniform chunk sizes.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Update `rechunker`, `xarray`, and `zarr` to their latest compatible versions. Verify if `consolidated=True` is used when writing to Zarr if applicable, and ensure all attributes are properly copied (addressed in `v0.5.1`).","cause":"Compatibility issues between `rechunker` and specific `xarray` or `zarr` versions, or incorrect metadata handling during the rechunking process.","error":"cannot open result of rechunker with xarray"},{"fix":"After creating the `rechunked` plan, call its `.execute()` method to perform the rechunking operation: `rechunked_plan = rechunk(...); result_array = rechunked_plan.execute()`.","cause":"Misunderstanding the `Rechunked` object's API. The `Rechunked` object returned by `rechunk()` is a plan that needs to be explicitly executed, not a Dask array that can be `persisted`.","error":"rechunker object has no attribute 'persist'"},{"fix":"Review `target_chunks` and `max_mem` parameters to ensure they are valid and sensible for the input array's dimensions and data type. Check GitHub issues for similar reported problems and potential workarounds, especially for edge cases with very small or very large dimensions.","cause":"This error likely indicates an issue with internal calculations related to chunk sizes or memory allocation, potentially occurring when `rechunker` tries to determine the optimal number of chunks or operations.","error":"ZeroDivisionError in L70 of api.py"}]}