Lightly

1.5.23 · active · verified Thu Apr 16

Lightly is a Python library and computer vision framework for self-supervised learning, built on top of PyTorch and PyTorch Lightning. It enables training deep learning models without manual data labels, focusing on understanding and filtering raw image data for efficient active learning and data curation pipelines. The current version is 1.5.23, and it maintains an active development and release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to set up and train a self-supervised MoCo model using Lightly, PyTorch, and PyTorch Lightning. It covers dataset preparation, data augmentations, model definition, loss function, and the training loop with a dummy dataset. Replace './path_to_your_dataset' with your actual image directory.

import torch
import pytorch_lightning as pl
from lightly.data import LightlyDataset, collate
from lightly.loss import MoCoLoss
from lightly.models.self_supervised import MoCo
from lightly.transforms.byol_transform import BYOLTransform

# 1. Define the input dataset
# Using a dummy dataset path for demonstration; replace with your actual image directory
# For real use, ensure 'path_to_your_dataset' contains images
dataset = LightlyDataset(input_dir="./path_to_your_dataset")

# 2. Define the data augmentations and collate function
transform = BYOLTransform(input_size=32)
collate_fn = collate(transform)

# 3. Create the PyTorch DataLoader
dataloader = torch.utils.data.DataLoader(
    dataset,
    batch_size=256,
    collate_fn=collate_fn,
    shuffle=True,
    drop_last=True,
    num_workers=4,
)

# 4. Define the self-supervised model
model = MoCo(memory_bank_size=4096)

# 5. Define the loss function
criterion = MoCoLoss()

# 6. Define the Lightning Module for training
class MoCoLightningModule(pl.LightningModule):
    def __init__(self, model, criterion):
        super().__init__()
        self.model = model
        self.criterion = criterion

    def training_step(self, batch, batch_idx):
        (x0, x1), _, _ = batch
        y0, y1 = self.model(x0, x1)
        loss = self.criterion(y0, y1)
        self.log("train_loss_ssl", loss)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.model.parameters(), lr=0.06)
        return optimizer

# 7. Train the model
# Ensure you have a GPU available or set accelerator='cpu'
lightning_model = MoCoLightningModule(model, criterion)
trainer = pl.Trainer(max_epochs=1, accelerator="auto", devices=1)

print("Starting Lightly self-supervised training...")
# Create a dummy folder if it doesn't exist to avoid errors for the quickstart
import os
if not os.path.exists("./path_to_your_dataset"):
    os.makedirs("./path_to_your_dataset")
    # Optionally, create a dummy image to make it runnable without user data
    from PIL import Image
    Image.new('RGB', (32, 32), color = 'red').save('./path_to_your_dataset/dummy_image.png')

trainer.fit(lightning_model, dataloader)
print("Training finished. Check the logs for 'train_loss_ssl'.")

view raw JSON →