{"id":9880,"library":"laion-clap","title":"LAION CLAP","description":"LAION CLAP (Contrastive Language-Audio Pretraining) is a Python library that provides a pre-trained multimodal model capable of understanding and embedding both text and audio inputs into a shared latent space. This allows for tasks like audio-text retrieval, zero-shot audio classification, and text-to-audio search. The current version is 1.1.7, and releases are typically made to incorporate new model weights, bug fixes, or minor feature enhancements.","status":"active","version":"1.1.7","language":"en","source_language":"en","source_url":"https://github.com/LAION-AI/CLAP","tags":["audio","nlp","multimodal","machine-learning","pytorch","embedding","contrastive-learning"],"install":[{"cmd":"pip install laion-clap","lang":"bash","label":"Basic installation"},{"cmd":"pip install laion-clap[full]","lang":"bash","label":"Full installation with all audio dependencies"}],"dependencies":[{"reason":"Required for loading audio files from disk (e.g., .wav, .mp3). Not strictly required if you provide audio as pre-loaded PyTorch tensors.","package":"soundfile","optional":true},{"reason":"Useful for advanced audio processing and loading, though `soundfile` is often sufficient for basic loading. Part of the `[full]` extra.","package":"librosa","optional":true},{"reason":"PyTorch's audio library, often used for loading and preprocessing. Part of the `[full]` extra.","package":"torchaudio","optional":true}],"imports":[{"note":"When installed via `pip install laion-clap`, the main class `CLAP` is exposed directly from the top-level `laion_clap` package, not from an internal module like `clap_module.model`.","wrong":"from clap_module.model import CLAP","symbol":"CLAP","correct":"from laion_clap import CLAP"}],"quickstart":{"code":"import torch\nfrom laion_clap import CLAP\n\n# Determine device (CUDA if available, otherwise CPU)\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nprint(f\"Using device: {device}\")\n\n# Initialize the CLAP model (default 'CLAP_512' model is loaded)\n# This will download model weights on first run (can be ~600MB).\nmodel = CLAP(version='CLAP_512', use_cuda=torch.cuda.is_available())\n\n# --- Text Embedding Example ---\ntext_data = [\n    \"A clear audio recording of a dog barking.\",\n    \"The sound of waves crashing on the shore.\"\n]\ntext_embeddings = model.get_text_embeddings(text_data)\nprint(f\"Text embeddings shape: {text_embeddings.shape}\")\n\n# --- Audio Embedding Example ---\n# For a runnable quickstart without needing actual audio files, \n# we generate dummy audio data. In a real scenario, you'd load files.\n# CLAP expects audio at 48kHz sampling rate, mono channel.\n\nsample_rate = 48000\nduration_seconds = 5\n# Generate a batch of 2 mono audio tensors (2 x 5 seconds at 48kHz)\ndummy_audio = torch.randn(2, sample_rate * duration_seconds)\n\n# Move audio to the correct device\naudio_data_tensors = [d.to(device) for d in dummy_audio]\n\n# Get audio embeddings. `resample=True` is often helpful to handle \n# potential mismatches in sample rates, though here our dummy data matches.\naudio_embeddings = model.get_audio_embeddings(audio_data_tensors, resample=True)\nprint(f\"Audio embeddings shape: {audio_embeddings.shape}\")\n\n# --- Similarity Calculation ---\n# Normalize embeddings for cosine similarity\ntext_embeddings_norm = text_embeddings / text_embeddings.norm(dim=-1, keepdim=True)\naudio_embeddings_norm = audio_embeddings / audio_embeddings.norm(dim=-1, keepdim=True)\n\nsimilarity = torch.matmul(text_embeddings_norm, audio_embeddings_norm.T)\nprint(f\"\\nSimilarity scores (text x audio):\\n{similarity.cpu().numpy()}\")\n# Expected: High similarity for text[0] with audio[0], text[1] with audio[1] (if embeddings were meaningful)","lang":"python","description":"This quickstart demonstrates how to initialize the CLAP model, generate embeddings for both text and (dummy) audio inputs, and calculate the similarity between them. The model weights are downloaded automatically on the first run. For actual audio files, use libraries like `soundfile`, `torchaudio`, or `librosa` to load them into PyTorch tensors before passing them to `get_audio_embeddings`."},"warnings":[{"fix":"Ensure stable internet connection. The cache directory is typically `~/.cache/torch/hub/checkpoints/`. You can pre-download if needed, but it's usually handled automatically.","message":"The CLAP model downloads its pre-trained weights (approx. 600MB-1.5GB depending on the version) to a cache directory on the first initialization. This can be slow and requires an active internet connection. Ensure sufficient disk space and network connectivity.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use the `resample=True` argument in `get_audio_embeddings` or manually resample/mixdown audio to 48kHz mono before passing it to the model. Libraries like `torchaudio` or `librosa` are useful for this.","message":"When processing audio, CLAP typically expects a specific sampling rate (e.g., 48000 Hz) and mono channel. Providing audio with different characteristics without resampling can lead to suboptimal embeddings or errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure PyTorch is installed with CUDA support (`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118` for CUDA 11.8). Verify CUDA availability with `torch.cuda.is_available()` and pass `use_cuda=True` during model initialization.","message":"Using the CLAP model on CPU can be significantly slower than using a GPU (CUDA). For larger batches or real-time applications, a CUDA-enabled GPU is highly recommended.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always use `from laion_clap import CLAP` after installing the `laion-clap` PyPI package.","message":"Older examples or internal code might refer to `clap_module.model.CLAP`. When installing via `pip install laion-clap`, this import path is incorrect.","severity":"deprecated","affected_versions":"<=1.1.7 (and likely future versions)"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Use the correct import path: `from laion_clap import CLAP`.","cause":"Attempting to import `CLAP` from `clap_module.model` after installing `laion-clap` from PyPI.","error":"ModuleNotFoundError: No module named 'clap_module'"},{"fix":"Reduce the batch size for text or audio inputs. If using a large CLAP model version, consider a smaller one or upgrading your GPU.","cause":"The input audio/text batch size or model size exceeds the available GPU memory.","error":"RuntimeError: CUDA out of memory. Tried to allocate XXX MiB (GPU XXX; XXX MiB total capacity; XXX MiB already allocated; XXX MiB free; XXX MiB reserved in total by PyTorch)"},{"fix":"Double-check the file path. Ensure the file exists and is accessible. Verify any relative paths are correct based on the current working directory.","cause":"The specified audio file path is incorrect or the file does not exist. This error typically occurs when `soundfile` attempts to load a non-existent file.","error":"soundfile.LibsndfileError: Error opening 'path/to/audio.wav': File not found."},{"fix":"Verify the correct model version names from the `laion-clap` documentation or GitHub repository. Common versions include 'CLAP_512', 'CLAP_630k'. Ensure the library version supports the requested model.","cause":"Attempting to load a CLAP model version that does not exist or whose name is misspelled.","error":"KeyError: 'CLAP_512' (or similar model name)"}]}