{"id":2624,"library":"open-clip-torch","title":"OpenCLIP","description":"OpenCLIP is an open-source implementation of OpenAI's Contrastive Language-Image Pre-training (CLIP) and related models. It enables training CLIP models at scale, leveraging state-of-the-art pretrained weights, and performing zero-shot image classification and retrieval. The current version is 3.3.0, with active development and regular releases.","status":"active","version":"3.3.0","language":"en","source_language":"en","source_url":"https://github.com/mlfoundations/open_clip","tags":["CLIP","vision-language","deep learning","pytorch","embeddings","zero-shot","multimodal"],"install":[{"cmd":"pip install open_clip_torch","lang":"bash","label":"Base Installation"},{"cmd":"pip install open_clip_torch[training]","lang":"bash","label":"With Training Dependencies"},{"cmd":"pip install -U timm","lang":"bash","label":"Update timm (Recommended for ConvNeXt, SigLIP, EVA encoders)"},{"cmd":"pip install transformers","lang":"bash","label":"Install transformers (If using transformer-based tokenizers)"}],"dependencies":[{"reason":"Core deep learning framework dependency.","package":"torch"},{"reason":"Required for image preprocessing transforms.","package":"torchvision"},{"reason":"Used for various image encoders (e.g., ConvNeXt, SigLIP, EVA).","package":"timm","optional":true},{"reason":"Required for certain transformer-based tokenizers.","package":"transformers","optional":true},{"reason":"Commonly used for image loading and manipulation (e.g., PIL.Image).","package":"Pillow"},{"reason":"Used for loading models from Hugging Face Hub.","package":"huggingface-hub","optional":true}],"imports":[{"symbol":"open_clip","correct":"import open_clip"},{"symbol":"create_model_and_transforms","correct":"model, _, preprocess = open_clip.create_model_and_transforms(...)"},{"note":"While 'from open_clip import tokenizer' works, the recommended pattern is `open_clip.get_tokenizer()` to retrieve the correct tokenizer instance for a given model.","wrong":"from open_clip import tokenizer","symbol":"get_tokenizer","correct":"tokenizer = open_clip.get_tokenizer(...)"}],"quickstart":{"code":"import torch\nfrom PIL import Image\nimport open_clip\nimport io\nimport base64\n\n# Create a dummy image (in a real scenario, load from file or URL)\ndummy_image_data = \"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII=\"\nimage = Image.open(io.BytesIO(base64.b64decode(dummy_image_data))).convert('RGB')\n\n# 1. Load model and preprocessing transforms\nmodel, _, preprocess = open_clip.create_model_and_transforms(\n    'ViT-B-32',\n    pretrained='laion2b_s34b_b79k'\n)\nmodel.eval() # Set model to evaluation mode\n\n# 2. Get tokenizer\ntokenizer = open_clip.get_tokenizer('ViT-B-32')\n\n# 3. Prepare inputs\nimage_input = preprocess(image).unsqueeze(0) # Add batch dimension\ntext_input = tokenizer([\"a diagram\", \"a dog\", \"a cat\"])\n\n# 4. Run inference\nwith torch.no_grad(): # Disable gradient computation for inference\n    image_features = model.encode_image(image_input)\n    text_features = model.encode_text(text_input)\n\n    # Normalize features\n    image_features /= image_features.norm(dim=-1, keepdim=True)\n    text_features /= text_features.norm(dim=-1, keepdim=True)\n\n    # Compute similarity scores\n    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)\n\nprint(\"Label probabilities:\", text_probs)\n\n# Optional: Interpret results\nlabels = [\"a diagram\", \"a dog\", \"a cat\"]\ntop_prob, top_idx = text_probs[0].max(dim=0)\nprint(f\"Predicted: {labels[top_idx]} ({top_prob.item():.1%} confidence)\")","lang":"python","description":"This quickstart demonstrates how to load a pre-trained OpenCLIP model, preprocess a dummy image and text, then compute the zero-shot similarity probabilities between the image and the given text labels. It includes loading the model, tokenizer, and performing inference with feature normalization."},"warnings":[{"fix":"pip install -U timm","message":"When using `timm`-based image encoders (e.g., ConvNeXt, SigLIP, EVA), ensure you have the latest `timm` library installed. Older versions may result in 'Unknown model' errors.","severity":"gotcha","affected_versions":"<= 3.x.x"},{"fix":"Specify model definitions with a `-quickgelu` postfix when loading OpenCLIP pretrained weights (e.g., `open_clip.create_model_and_transforms('ViT-B-32-quickgelu', ...)`).","message":"The default activation function for models changed from `QuickGELU` to `torch.nn.GELU` in newer PyTorch versions. For OpenCLIP pretrained weights, using model definitions with a `-quickgelu` postfix (e.g., 'ViT-B-32-quickgelu') is necessary to match the original training and avoid an accuracy drop, especially during fine-tuning.","severity":"breaking","affected_versions":"All versions, due to underlying PyTorch/model defaults"},{"fix":"Verify `torch` and `open-clip-torch` compatibility. Downgrade `open_clip_torch` or upgrade `torch` as needed. Consult OpenCLIP's GitHub issues for known compatibility pairs.","message":"Mismatch between installed `torch` and `open-clip-torch` versions can lead to `ModuleNotFoundError` or other runtime issues. Ensure compatible versions are installed, often by following PyTorch's installation instructions for your CUDA version before installing OpenCLIP.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Wrap inference calls with `with torch.no_grad(), torch.autocast('cuda'):` for GPU inference.","message":"For optimal performance and consistency with original CLIP, OpenCLIP is designed to be used within a mixed-precision context (e.g., `torch.autocast('cuda')`) as OpenAI's original models utilized mixed-precision. Without it, there might be slight numerical differences in embeddings or reduced performance on GPU.","severity":"gotcha","affected_versions":"All versions"},{"fix":"pip install transformers","message":"If you are using models that rely on transformer tokenizers (e.g., certain text encoders), the `transformers` library must be installed separately, as it is an optional dependency for `open-clip-torch`.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}