Jupyter Cache
Jupyter Cache provides a defined interface for working with a cache of Jupyter notebooks. It enables execution and caching of notebooks, intelligently re-executing them only when code cells or related metadata have changed, rather than for every minor edit. The library offers both a Command-Line Interface (CLI) and a Python API for managing project notebooks, executing them, and retrieving detailed execution reports including timing statistics and exception tracebacks. It is utilized by projects like Jupyter Book to accelerate document builds by preventing unnecessary re-execution of unchanged notebook content. The current version is 1.0.1, with a release cadence driven by feature enhancements and dependency updates.
Warnings
- breaking Python 3.7 support was dropped in v0.6.0. Projects using Python 3.7 must upgrade their Python version before updating to jupyter-cache v0.6.0 or later.
- breaking A significant API/CLI re-write occurred in v0.5.0. Commands and Python API calls related to 'staging' notebooks were rephrased to 'notebook' or 'project'. For instance, 'stage add' became 'notebook add', and the Python API methods changed accordingly.
- gotcha For jupyter-cache to be effective, notebooks must exhibit deterministic execution outputs. This means they should run in a consistent environment, avoid non-deterministic code (e.g., random number generation without seeding), and not rely on external, changing resources.
Install
-
pip install jupyter-cache
Imports
- get_cache
from jupyter_cache import get_cache
- CacheBundleIn
from jupyter_cache.base import CacheBundleIn
- load_executor
from jupyter_cache.executors import load_executor
Quickstart
import os
import pathlib
import nbformat as nbf
from jupyter_cache import get_cache
# Define cache path (can be set via JUPYTERCACHE env var too)
cache_path = pathlib.Path('./.my_notebook_cache')
# Create a dummy notebook file
nb_content = nbf.v4.new_notebook()
nb_content.cells.append(nbf.v4.new_code_cell("a = 1\nb = 2\nprint(a + b)"))
notebook_path = pathlib.Path('./example.ipynb')
with open(notebook_path, 'w', encoding='utf8') as f:
nbf.write(nb_content, f)
try:
# Initialize the cache
cache = get_cache(cache_path)
print(f"Cache initialized at: {cache.path}")
# Clear cache for a clean start (optional)
cache.clear_cache()
# Add the notebook to the project
# Note: 'notebook' is the current API, 'stage' was used in older versions
cache.add_notebook_to_project(notebook_path)
print(f"Notebook '{notebook_path.name}' added to project.")
# Execute the notebooks in the project
# 'local-serial' is one of the default executors
cache.execute_project_notebooks(executor_name='local-serial')
print("Notebooks executed.")
# List project records to see status
print("\nProject Records:")
for record in cache.list_project_records():
print(f" ID: {record.pk}, URI: {record.uri}, Status: {record.status}")
# Retrieve a merged notebook with outputs
record_pk = cache.list_project_records()[0].pk
merged_nb = cache.get_executed_notebook(record_pk)
print(f"\nRetrieved executed notebook for PK {record_pk}, cells: {len(merged_nb.cells)}")
# Clean up the generated notebook file
notebook_path.unlink(missing_ok=True)
# The cache directory can be cleared or deleted manually if needed
# cache.clear_cache()
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Clean up the dummy notebook file if an error occurred before unlinking
notebook_path.unlink(missing_ok=True)
# Consider adding cache_path.rmdir() or shutil.rmtree(cache_path) for full cleanup in tests
# but be careful with production environments.