Token Merging for Stable Diffusion (tomesd)
tomesd is a Python library that implements Token Merging (ToMe) for Stable Diffusion models. It accelerates inference by merging redundant tokens, reducing the computational load on the transformer blocks without requiring model retraining. The library is pure Python and PyTorch-based, currently at version 0.1.3, with an active development and release cadence focused on performance improvements and compatibility.
Common errors
-
ModuleNotFoundError: No module named 'tomesd'
cause The `tomesd` package is not installed in the Python environment currently being used by your Stable Diffusion application or script.fixEnsure `tomesd` is installed correctly in the active Python environment: `pip install tomesd`. If using a virtual environment (e.g., `venv`, `conda`), activate it before installing. -
Failed to apply ToMe patch, continuing as normal module 'tomesd' has no attribute 'apply_patch'
cause This usually indicates an incomplete or corrupted installation, or an incorrect import. The `tomesd` module was found, but `apply_patch` is missing or not accessible. This could happen if a partial or incorrect `tomesd` directory exists in the Python path.fixTry reinstalling `tomesd`: `pip uninstall tomesd` followed by `pip install tomesd`. If installing from source, ensure `python setup.py build develop` was run successfully. Double-check the import statement is `import tomesd` before calling `tomesd.apply_patch()`. -
RuntimeError: The size of tensor a (X) must match the size of tensor b (Y) at non-singleton dimension Z
cause This error often occurs when `tomesd` is applied to models generating at certain non-standard resolutions that cause tensor dimension mismatches during the token merging process. Issues with specific resolutions (e.g., 1920x1080) were reported early on.fixUpgrade `tomesd` to the latest version, as compatibility with more resolutions was added in v0.1.1. If the problem persists, try adjusting image dimensions to multiples of 16 or using common Stable Diffusion resolutions like 512x512 or 768x768 for better compatibility. [cite: v0.1.1, 10] -
Lower than expected speedup or even slowdown when using tomesd.
cause This can occur on smaller image resolutions, with specific GPU architectures, or when other powerful optimizations (like `xformers` or `torch.compile` using SDPA) are already heavily optimizing the pipeline. The overhead of token merging might outweigh its benefits in such scenarios.fixVerify that `tomesd` is actually being called (e.g., add print statements). Ensure you are testing with larger image resolutions (e.g., 1024x1024) where `tomesd` typically provides more significant gains. Consider disabling other transformer optimizations temporarily to isolate the effect of `tomesd` and then re-evaluate.
Warnings
- gotcha Prior to v0.1.3, `tomesd`'s random perturbations could affect the global torch seed, potentially leading to inconsistencies in image generation if not explicitly managed. [cite: v0.1.3]
- gotcha When `use_rand` is enabled, odd batch sizes (where prompted and unprompted images are not in the same batch) could lead to artifacting. [cite: v0.1.2]
- gotcha `tomesd` is a lossy process, meaning applying it will subtly change the generated image. While designed to minimize quality loss, aggressive merging (higher `ratio`) can degrade image quality.
- gotcha Expected speedups from `tomesd` can vary significantly based on image resolution, batch size, and the underlying Stable Diffusion implementation (e.g., `diffusers` vs. original `runway-ml` repo). Smaller images or pipelines with existing optimizations (like `xformers` or `torch.compile`) might show less dramatic gains.
Install
-
pip install tomesd
Imports
- tomesd
from tomesd import apply_patch
import tomesd
- apply_patch
from tomesd import apply_patch; apply_patch(model, ratio=0.5)
tomesd.apply_patch(model, ratio=0.5)
Quickstart
import torch, tomesd
from diffusers import StableDiffusionPipeline
import os
# Ensure you have a Hugging Face token set up if accessing private models
# For public models, this is often not strictly needed, but good practice
# Replace 'YOUR_HF_TOKEN' with an actual token or use environment variable
hf_token = os.environ.get('HF_TOKEN', '')
# 1. Load a Stable Diffusion pipeline
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
# Pass token if required for private models
# use_auth_token=hf_token if hf_token else None
).to("cuda")
# 2. Apply ToMe with a 50% merging ratio
# The 'ratio' parameter controls the amount of token merging. Higher ratio means more speedup, potentially lower quality.
tomesd.apply_patch(pipeline, ratio=0.5)
# 3. Generate an image
prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt).images[0]
# 4. Save the image
image.save("astronaut.png")
print("Image generated and saved as astronaut.png")