K-Diffusion
K-Diffusion is a PyTorch library implementing the improved diffusion models from Karras et al. (2022). It provides a highly optimized collection of samplers (e.g., DPM-Solver, Euler) and utilities for building and running stable diffusion models. The current version is 0.1.1.post1, and it maintains an active, community-driven release schedule primarily focused on stability and integration with other generative AI projects.
Common errors
-
ModuleNotFoundError: No module named 'k_diffusion'
cause The k-diffusion library is not installed in your Python environment.fixRun `pip install k-diffusion` to install the library. -
TypeError: forward() got an unexpected keyword argument 'sigma'
cause Your underlying PyTorch model's `forward` method does not accept `sigma` as an argument directly, but a `k_diffusion` sampler is trying to pass it.fixYou likely need to wrap your model using `k_diffusion.external.CompVisDenoiser` or a similar wrapper that adapts the `k-diffusion` API to your model's native `forward` signature. -
RuntimeError: CUDA out of memory. Tried to allocate X GiB (GPU N; X GiB total capacity; Y GiB already allocated; Z GiB free; P MiB reserved in total by PyTorch)
cause Your GPU does not have enough memory to run the model or sampling process with the current settings.fixReduce the `batch_size`, use a smaller image resolution, or offload parts of your model to CPU if the architecture allows (e.g., with specific `diffusers` pipelines). Consider using a GPU with more VRAM. -
AttributeError: module 'k_diffusion.sampling' has no attribute 'sample_dpmpp_2m_sde_v2'
cause The specific sampler function name you are trying to use does not exist or has been renamed in your installed version of `k-diffusion`.fixCheck the available functions in the `k_diffusion.sampling` module by using `dir(k_diffusion.sampling)` or consult the official GitHub repository for the correct sampler names for your version.
Warnings
- breaking Sampler function signatures and names changed in early versions (pre-0.1.0). For example, `sample_dpmpp_2m_sde` might have been replaced by a newer version or slightly different arguments.
- gotcha K-Diffusion expects models to conform to a specific API where the forward pass takes `(x, sigma)` (and optionally `conditioning`). If you're wrapping an external UNet, ensure you use wrappers like `external.CompVisDenoiser` or `external.AutoencoderKLWrapper` correctly, or adapt your custom model's forward method.
- gotcha Tensor shape and value ranges are crucial. `k-diffusion` typically operates on latents (e.g., `(B, C, H, W)`) and expects `sigma` values, not raw `timestep` integers, for its samplers. The output of the wrapped model (denoised output or predicted noise) must also match expectations.
- gotcha CUDA out of memory errors are common when using large models or high batch sizes, especially without sufficient GPU memory or when mixing CPU/GPU tensors incorrectly.
Install
-
pip install k-diffusion
Imports
- sample_dpmpp_2m
from k_diffusion.sampling import sample_dpmpp_2m_sde_v2
from k_diffusion import sampling # ... sampling.sample_dpmpp_2m(...)
- CompVisDenoiser
from k_diffusion.models import CompVisDenoiser
from k_diffusion import external # ... external.CompVisDenoiser(...)
Quickstart
import torch
from k_diffusion import sampling, external
# 1. Define a dummy UNet-like model (replace with your actual pre-trained UNet)
# This mock UNet simulates a model expecting (latent, timestep, conditioning) input.
class DummyUNet(torch.nn.Module):
def __init__(self, in_channels=4, out_channels=4, img_size=64):
super().__init__()
self.conv = torch.nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
self.relu = torch.nn.ReLU()
def forward(self, x, timesteps, context=None):
# In a real UNet, timesteps and context would be used for conditioning.
return self.relu(self.conv(x))
# Instantiate the dummy UNet
inner_model = DummyUNet()
# 2. Wrap the UNet with k-diffusion's external denoiser (e.g., for Stable Diffusion latents)
# This wrapper adapts the UNet's API to k-diffusion's expected (x, sigma) signature.
model_wrap = external.CompVisDenoiser(inner_model)
model_wrap.eval().cpu() # Set to eval mode and move to CPU for quickstart simplicity
# 3. Prepare initial noisy latents and define the sampling schedule
batch_size = 1
channels = 4 # Common for Stable Diffusion latent space
height, width = 64, 64 # Latent resolution (e.g., 512x512 image -> 64x64 latent)
initial_noise = torch.randn(batch_size, channels, height, width, device='cpu') * 8.0
sigmas = sampling.get_sigmas_karras(n=40, sigma_min=0.1, sigma_max=8.0, device='cpu')
# 4. Run the sampling process using a DPM++ 2M sampler
# The sampler takes the wrapped model, initial noise, and the sigma schedule.
with torch.no_grad():
print("Starting K-Diffusion sampling (DPM++ 2M)...")
denoised_latents = sampling.sample_dpmpp_2m(
model_wrap, # The wrapped model callable
initial_noise, # Initial noisy latents
sigmas # Sigma schedule
# Optional: `extra_args` can pass conditioning, e.g., {'cond': text_embeddings}
)
print(f"Sampling complete. Denoised latents shape: {denoised_latents.shape}")
# In a real pipeline, `denoised_latents` would then be decoded to an image.