Diffusers
Hugging Face library for state-of-the-art diffusion models: Stable Diffusion, FLUX, SDXL, video generation, and more. Current version is 0.37.1. Core API: DiffusionPipeline.from_pretrained(). Always set torch_dtype=torch.float16 or bfloat16 — default float32 causes OOM on most GPUs.
Warnings
- breaking Not setting torch_dtype=torch.float16 loads the model in float32, typically requiring 14GB+ VRAM for SD 1.5 and 40GB+ for SDXL. Causes immediate CUDA OOM on most consumer GPUs. The most common LLM-generated diffusers bug.
- breaking callback and callback_steps parameters deprecated across all pipelines. Raises FutureWarning now, will raise TypeError in future release.
- breaking from_single_file() model config args (num_in_channels, scheduler_type, image_size, upcast_attention) deprecated since 0.28. These were SD-specific anti-patterns not supported in from_pretrained().
- breaking enable_model_cpu_offload() and enable_sequential_cpu_offload() require accelerate to be installed. Calling them without accelerate raises ImportError.
- gotcha Model hub IDs change over time. 'CompVis/stable-diffusion-v1-4' and 'runwayml/stable-diffusion-v1-5' are outdated hub IDs from early tutorials. The current canonical SD 1.5 repo is 'stable-diffusion-v1-5/stable-diffusion-v1-5'.
- gotcha Pipeline output is always a dataclass, not a tensor. pipe(...).images returns a list of PIL Images, not a tensor. Accessing .images[0] gives the first PIL Image.
- gotcha Upgrading diffusers without matching transformers version can silently degrade output quality or cause errors. diffusers and transformers are tightly coupled — each diffusers release targets specific transformers versions.
Install
-
pip install diffusers -
pip install diffusers[torch] -
pip install diffusers transformers accelerate
Imports
- DiffusionPipeline
from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained( 'stable-diffusion-v1-5/stable-diffusion-v1-5', torch_dtype=torch.float16 # REQUIRED — omitting uses float32 and will OOM on most GPUs ) pipe = pipe.to('cuda') image = pipe('An astronaut on Mars').images[0] - callback
def step_callback(pipe, step, timestep, callback_kwargs): # process latents here return callback_kwargs image = pipe( prompt, callback_on_step_end=step_callback, callback_on_step_end_tensor_inputs=['latents'] ).images[0]
Quickstart
from diffusers import DiffusionPipeline
import torch
# Text-to-image
pipe = DiffusionPipeline.from_pretrained(
'stable-diffusion-v1-5/stable-diffusion-v1-5',
torch_dtype=torch.float16
).to('cuda')
image = pipe('A cat wearing a hat').images[0]
image.save('output.png')
# Memory-efficient: CPU offload (requires accelerate)
pipe.enable_model_cpu_offload()
# FLUX (latest high-quality model)
flux_pipe = DiffusionPipeline.from_pretrained(
'black-forest-labs/FLUX.1-schnell',
torch_dtype=torch.bfloat16
).to('cuda')
image = flux_pipe(
'An astronaut riding a horse on Mars',
guidance_scale=0.,
num_inference_steps=4
).images[0]