{"id":5473,"library":"segmentation-models-pytorch","title":"Segmentation Models PyTorch","description":"Segmentation Models PyTorch (SMP) is a Python library offering a high-level API for various neural network architectures, pre-trained backbones, losses, and metrics for image semantic segmentation. It supports 12 encoder-decoder architectures and over 800 pre-trained convolutional and transformer-based encoders, leveraging `timm` for a vast selection. The library focuses on simplicity, fast convergence, and compatibility with PyTorch's `torch.jit.script`, `torch.compile`, and `torch.export` features. It is currently at version 0.5.0 and maintains an active release cadence with frequent updates and new model integrations.","status":"active","version":"0.5.0","language":"en","source_language":"en","source_url":"https://github.com/qubvel-org/segmentation_models.pytorch","tags":["pytorch","segmentation","deep-learning","computer-vision","models","unet","fpn","deeplabv3","transformers","image-processing"],"install":[{"cmd":"pip install segmentation-models-pytorch","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Core PyTorch dependency for model definition and execution.","package":"torch","optional":false},{"reason":"Provides a large collection of pre-trained image models used as encoders (backbones).","package":"timm","optional":false},{"reason":"Fundamental package for numerical computing.","package":"numpy","optional":false},{"reason":"Image processing library, often used in data loading.","package":"pillow","optional":false},{"reason":"Provides datasets, models, and image transformations for computer vision.","package":"torchvision","optional":true},{"reason":"Popular library for image augmentation, frequently used with SMP for training.","package":"albumentations","optional":true},{"reason":"For saving, loading, and sharing models with Hugging Face Hub (added in v0.3.4).","package":"huggingface-hub","optional":true}],"imports":[{"symbol":"smp","correct":"import segmentation_models_pytorch as smp"},{"note":"Architectures are directly accessible from the top-level `smp` module for convenience.","wrong":"from segmentation_models_pytorch.models import Unet","symbol":"Unet","correct":"model = smp.Unet(...)"},{"symbol":"get_preprocessing_fn","correct":"from segmentation_models_pytorch.encoders import get_preprocessing_fn"}],"quickstart":{"code":"import torch\nimport segmentation_models_pytorch as smp\nfrom segmentation_models_pytorch.encoders import get_preprocessing_fn\n\n# 1. Create segmentation model\nmodel = smp.Unet(\n    encoder_name=\"resnet34\",          # choose encoder backbone\n    encoder_weights=\"imagenet\",      # use `imagenet` pre-trained weights\n    in_channels=3,                   # model input channels (3 for RGB)\n    classes=1,                       # model output channels (number of classes)\n    activation='sigmoid'             # activation function for binary segmentation\n)\n\n# 2. Configure data preprocessing (important for pre-trained encoders)\npreprocess_input = get_preprocessing_fn('resnet34', pretrained='imagenet')\n\n# Example usage:\n# Dummy input image (batch_size=1, channels=3, height=256, width=256)\nimage = torch.randn(1, 3, 256, 256)\n\n# Apply preprocessing (e.g., normalization)\ninput_tensor = preprocess_input(image)\n\n# Forward pass\nmodel.eval()\nwith torch.no_grad():\n    predicted_mask = model(input_tensor)\n\nprint(f\"Model output shape: {predicted_mask.shape}\")","lang":"python","description":"This quickstart demonstrates how to instantiate a U-Net model with a pre-trained ResNet34 encoder, configure input and output channels, and set an activation function. It also shows how to obtain and apply the correct preprocessing function required for models with pre-trained backbones to ensure optimal performance."},"warnings":[{"fix":"Re-train UperNet models with SMP v0.5.0 or ensure strict versioning. For fine-tuning, load the new `smp-hub` checkpoints.","message":"The `UperNet` model architecture underwent significant changes in v0.5.0, making model weights trained with v0.4.0 incompatible with v0.5.0. Existing UperNet models will need to be re-trained or adapted.","severity":"breaking","affected_versions":">=0.5.0"},{"fix":"Update `encoder_name` arguments to use the `tu-` prefix for all `timm` encoders.","message":"Encoders from the `timm` library previously accessed with a `timm-` prefix (e.g., `timm-resnet34`) are deprecated in v0.5.0. The recommended way to use `timm` encoders is now with the `tu-` prefix (e.g., `tu-resnet34`).","severity":"deprecated","affected_versions":">=0.5.0"},{"fix":"Update import paths and usage from `smp.utils.losses` to `smp.losses`.","message":"The `smp.utils.losses` module was deprecated in v0.2.0. All loss functions have been moved to the `smp.losses` module.","severity":"deprecated","affected_versions":">=0.2.0"},{"fix":"Ensure your environment uses Python 3.7 or newer.","message":"The minimum Python version requirement was increased from 3.6 to 3.7.","severity":"breaking","affected_versions":">=0.3.2"},{"fix":"Review release notes for v0.3.4 and `albumentations` documentation for affected function names if custom integrations are breaking. Upgrade `albumentations` if issues persist.","message":"To ensure compatibility with `albumentations` versions >= 1.4.0, some internal function names that interact with `albumentations` may have changed, requiring updates if you directly extended or modified SMP's data processing pipelines.","severity":"gotcha","affected_versions":">=0.3.4"},{"fix":"Always integrate `get_preprocessing_fn` into your data loading pipeline to preprocess images before feeding them to the model.","message":"For optimal performance, especially when using pre-trained encoders, it is crucial to apply the correct preprocessing steps (e.g., normalization, resizing) to your input data, matching how the encoder's weights were pre-trained. Use `smp.encoders.get_preprocessing_fn` for this.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure input image height and width are compatible with the chosen model architecture and encoder, often powers of 2 (e.g., 256x256, 512x512). Refer to specific model documentation for exact requirements.","message":"Some models may require input image dimensions to be a power of 2, or they may handle incorrect sizes with specific interpolation/padding methods. Unexpected input sizes can lead to errors or degraded performance.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}