Scrapbook
Scrapbook is a Python library for recording and reading data in Jupyter and nteract Notebooks. It allows users to persist data values and generated visual content (referred to as 'scraps') directly within the notebook file's output. These recorded scraps can then be recalled, read, or summarized programmatically for later use or for building robust notebook workflows. It aims to replace existing record functionality in libraries like Papermill. The library is actively maintained by the nteract team.
Warnings
- breaking The `scrapbook` package on PyPI was formerly published under the name `nteract-scrapbook`. With version 0.5.0, the package name changed to `scrapbook`. If you were installing `nteract-scrapbook`, you need to update your dependency to `scrapbook`.
- deprecated Scrapbook replaces `papermill`'s direct record functionality. While some backward compatibility exists (e.g., `nb.papermill_dataframe`), it is recommended to transition to Scrapbook's `glue` and `read_notebook` API for recording and retrieving data.
- gotcha When using `sb.glue()` to store pandas DataFrames, `scrapbook` leverages `pyarrow` to convert the DataFrame to a base64 encoded Parquet file. This process can fail if the DataFrame contains certain complex nested objects (e.g., columns with dictionaries or sets directly within them), raising an `Arrow` exception.
- breaking Python 2.7 support was officially dropped after 2020. Versions 0.4.0 and newer are Python 3 (3.5+) only. The documentation states Python 3.6+ is supported.
- gotcha Calling `sb.glue()` outside of an active Jupyter/nteract kernel context (e.g., in a plain Python script) may not correctly persist the data or could raise warnings. `scrapbook` relies on the kernel's display machinery to store 'scraps' in the notebook's output.
Install
-
pip install scrapbook -
pip install scrapbook[all]
Imports
- scrapbook
import scrapbook as sb
- glue
sb.glue('my_data', {'key': 'value'}) - read_notebook
notebook = sb.read_notebook('path/to/output.ipynb') - read_notebooks
scrapbook_collection = sb.read_notebooks('path/to/directory') - Scrapbook
from scrapbook.models import Scrapbook
Quickstart
import scrapbook as sb
import os
# --- Part 1: Write data to a dummy notebook (simulating execution) ---
# This part would typically run inside a Jupyter/nteract notebook cell.
# For demonstration, we'll create a dummy output file.
# In a real notebook, you'd just call sb.glue directly.
# Here, we simulate it by writing to a temporary file.
notebook_content_template = '''{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"application/scrapbook+json": {
"data": {},
"encoder": "json",
"name": "my_string",
"display": null
}
},
"metadata": {},
"output_type": "display_data"
}
],
"source": ["import scrapbook as sb\n", "sb.glue('my_string', 'Hello Scrapbook!')"]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"application/scrapbook+json": {
"data": 12345,
"encoder": "json",
"name": "my_number",
"display": null
}
},
"metadata": {},
"output_type": "display_data"
}
],
"source": ["sb.glue('my_number', 12345)"]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}'''
# Manually inject the data for the example since we're not running a live kernel
# In a real scenario, these outputs would be generated by `sb.glue` calls
import json
nb_dict = json.loads(notebook_content_template)
# Update the 'my_string' scrap
nb_dict['cells'][0]['outputs'][0]['data']['application/scrapbook+json']['data'] = 'Hello Scrapbook!'
# Update the 'my_number' scrap
nb_dict['cells'][1]['outputs'][0]['data']['application/scrapbook+json']['data'] = 12345
output_notebook_path = 'output_test_notebook.ipynb'
with open(output_notebook_path, 'w') as f:
json.dump(nb_dict, f, indent=4)
print(f"Created dummy notebook: {output_notebook_path}")
# --- Part 2: Read data from the notebook ---
# This part can run in a separate script or notebook.
# Read the notebook containing the 'scraps'
nb = sb.read_notebook(output_notebook_path)
# Access a specific scrap by name
my_string_scrap = nb.scraps.my_string
my_number_scrap = nb.scraps.my_number
print(f"\nRetrieved string scrap: {my_string_scrap.data}")
print(f"Retrieved number scrap: {my_number_scrap.data}")
# You can also get all scraps as a dictionary
all_scraps = nb.scraps.to_dict()
print(f"\nAll scraps: {all_scraps}")
# Clean up the dummy file
os.remove(output_notebook_path)
print(f"Cleaned up {output_notebook_path}")