Jupyter Cache

1.0.1 · active · verified Sun Apr 12

Jupyter Cache provides a defined interface for working with a cache of Jupyter notebooks. It enables execution and caching of notebooks, intelligently re-executing them only when code cells or related metadata have changed, rather than for every minor edit. The library offers both a Command-Line Interface (CLI) and a Python API for managing project notebooks, executing them, and retrieving detailed execution reports including timing statistics and exception tracebacks. It is utilized by projects like Jupyter Book to accelerate document builds by preventing unnecessary re-execution of unchanged notebook content. The current version is 1.0.1, with a release cadence driven by feature enhancements and dependency updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to programmatically initialize a Jupyter Cache, add a notebook to a project, execute it using a local serial executor, and retrieve the executed notebook with its outputs. It first creates a dummy notebook file, then uses the `jupyter_cache` API to manage its lifecycle within the cache.

import os
import pathlib
import nbformat as nbf
from jupyter_cache import get_cache

# Define cache path (can be set via JUPYTERCACHE env var too)
cache_path = pathlib.Path('./.my_notebook_cache')

# Create a dummy notebook file
nb_content = nbf.v4.new_notebook()
nb_content.cells.append(nbf.v4.new_code_cell("a = 1\nb = 2\nprint(a + b)"))
notebook_path = pathlib.Path('./example.ipynb')
with open(notebook_path, 'w', encoding='utf8') as f:
    nbf.write(nb_content, f)

try:
    # Initialize the cache
    cache = get_cache(cache_path)
    print(f"Cache initialized at: {cache.path}")

    # Clear cache for a clean start (optional)
    cache.clear_cache()

    # Add the notebook to the project
    # Note: 'notebook' is the current API, 'stage' was used in older versions
    cache.add_notebook_to_project(notebook_path)
    print(f"Notebook '{notebook_path.name}' added to project.")

    # Execute the notebooks in the project
    # 'local-serial' is one of the default executors
    cache.execute_project_notebooks(executor_name='local-serial')
    print("Notebooks executed.")

    # List project records to see status
    print("\nProject Records:")
    for record in cache.list_project_records():
        print(f"  ID: {record.pk}, URI: {record.uri}, Status: {record.status}")

    # Retrieve a merged notebook with outputs
    record_pk = cache.list_project_records()[0].pk
    merged_nb = cache.get_executed_notebook(record_pk)
    print(f"\nRetrieved executed notebook for PK {record_pk}, cells: {len(merged_nb.cells)}")
    # Clean up the generated notebook file
    notebook_path.unlink(missing_ok=True)
    # The cache directory can be cleared or deleted manually if needed
    # cache.clear_cache()
except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Clean up the dummy notebook file if an error occurred before unlinking
    notebook_path.unlink(missing_ok=True)
    # Consider adding cache_path.rmdir() or shutil.rmtree(cache_path) for full cleanup in tests
    # but be careful with production environments.

view raw JSON →