{"id":7735,"library":"spacy-transformers","title":"spaCy Transformers: Integrate Hugging Face Models","description":"The `spacy-transformers` library provides spaCy components and architectures to seamlessly integrate pre-trained transformer models from Hugging Face's `transformers` library into spaCy pipelines. It enables convenient access to state-of-the-art architectures like BERT, GPT-2, and XLNet for various NLP tasks, leveraging spaCy v3's powerful and extensible configuration system for multi-task learning. The current version is 1.4.0, and releases are generally aligned with spaCy's major version updates and `transformers` library advancements.","status":"active","version":"1.4.0","language":"en","source_language":"en","source_url":"https://github.com/explosion/spacy-transformers","tags":["spaCy","transformers","NLP","BERT","embedding","machine learning","Hugging Face"],"install":[{"cmd":"pip install 'spacy[transformers]'\npython -m spacy download en_core_web_trf","lang":"bash","label":"CPU Installation and Model Download"},{"cmd":"pip install 'spacy[transformers,cudaXX]' # Replace XX with your CUDA version (e.g., cuda113)\npython -m spacy download en_core_web_trf","lang":"bash","label":"GPU Installation (PyTorch with CUDA) and Model Download"}],"dependencies":[{"reason":"Core NLP library; spacy-transformers requires spaCy v3.0+ and has specific minor version compatibility.","package":"spacy","optional":false},{"reason":"The underlying library providing access to pre-trained transformer models from Hugging Face.","package":"transformers","optional":false},{"reason":"Backend deep learning framework for transformer models. PyTorch is typically installed automatically but specific CUDA versions may require manual installation.","package":"torch","optional":false},{"reason":"Provides GPU support for spaCy's Thinc (and thus spacy-transformers) when using CUDA, installed via spaCy's `[cudaXX]` extras.","package":"cupy","optional":true}],"imports":[{"note":"Used for programmatic configuration or building custom pipelines. For typical usage, load a transformer-backed spaCy model directly.","symbol":"Transformer","correct":"from spacy_transformers import Transformer"}],"quickstart":{"code":"import spacy\n\n# Ensure you have downloaded a transformer-backed model, e.g., using:\n# python -m spacy download en_core_web_trf\n\nnlp = spacy.load(\"en_core_web_trf\")\ntext = \"Apple is acquiring a London-based AI startup for $200M.\"\ndoc = nlp(text)\n\nprint(f\"Text: {text}\")\nprint(f\"Entities: {[(ent.text, ent.label_) for ent in doc.ents]}\")\n\n# Accessing transformer output (e.g., pooled vector for the doc)\n# Note: Raw transformer outputs are typically stored in doc._.trf_data or doc.tensor\nif doc.has_annotation(\"SENT_START\"): # Check if sentencizer is in pipeline\n    print(f\"Document vector (first token of first sentence): {doc[0].vector[:5]}\") # First 5 elements of vector","lang":"python","description":"This quickstart demonstrates loading a pre-trained, transformer-backed spaCy model (like `en_core_web_trf`) and processing text to extract entities, showcasing the integration. It also briefly touches on accessing the transformer's vector outputs, which power subsequent spaCy components."},"warnings":[{"fix":"Upgrade your spaCy installation to v3.0+ and then install `spacy-transformers` v1.x. Retrain any custom pipelines or download compatible `_trf` models for spaCy v3.","message":"`spacy-transformers` underwent a significant refactoring for spaCy v3.0+. Versions 0.6.x and earlier (for spaCy v2.x) are incompatible with v1.x and later (for spaCy v3.x). Pipelines trained with v0.x will not work with v1.x.","severity":"breaking","affected_versions":"<1.0.0"},{"fix":"Always check the `spacy-transformers` documentation or `pyproject.toml` for the exact `spaCy` version requirements. Use `pip install 'spacy[transformers]'` to let pip resolve compatible versions. Run `python -m spacy validate` to check installed package compatibility.","message":"Strict version compatibility exists between `spacy-transformers` and `spaCy`. For example, `spacy-transformers` v1.2.x requires `spaCy` v3.5.0+. Installing incompatible versions can lead to unexpected errors or warnings about pipeline incompatibility.","severity":"gotcha","affected_versions":"All versions"},{"fix":"To use task-specific heads, either train separate spaCy components (like `textcat` or `ner`) that consume the transformer features, or consider `spacy-huggingface-pipelines` for direct integration of task-specific Hugging Face models.","message":"The `Transformer` component in `spacy-transformers` acts as a feature extractor, providing contextual embeddings to downstream spaCy components. It does not natively expose task-specific heads (e.g., for text classification or token classification) from the Hugging Face model for direct inference or training.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For GPU, ensure `spacy[transformers,cudaXX]` is installed with the correct PyTorch CUDA build. Reduce `batch_size` and `max_length` in your config. Consider smaller transformer models (e.g., DistilBERT). For very long documents, `spacy-transformers` handles sentence splitting internally, but excessive length can still be an issue. Use `nlp.pipe(texts, batch_size=...)` for efficient batch processing.","message":"Transformer models are computationally intensive and memory-hungry. Training and inference, especially with larger models or long documents, are significantly slower on CPU and often require a GPU (with CUDA) for practical performance. Memory issues ('CUDA out of memory') are common.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure `spacy` v3.0+ is installed: `pip install -U 'spacy>=3.0.0'`. Then install `spacy-transformers`: `pip install 'spacy[transformers]'`. If using GPU, ensure correct CUDA extras are specified.","cause":"The `spacy-transformers` package is not installed, or the installed `spaCy` version is too old (e.g., v2.x) for the `spacy-transformers` version. This error indicates that spaCy cannot find the 'trf' language entry point which is registered by `spacy-transformers`.","error":"No module named 'spacy.lang.trf'"},{"fix":"Reduce the `batch_size` during `nlp.pipe()` calls or in your training configuration. Consider using a smaller transformer model (e.g., DistilBERT). If training, implement gradient accumulation. If possible, upgrade to a GPU with more memory. You can also add a `doc_cleaner` component to remove `doc._.trf_data` after processing to reduce memory during subsequent steps.","cause":"The transformer model, input batch size, or document length exceeds the available GPU memory. Transformer models, especially large ones like BERT-large or RoBERTa-large, consume substantial GPU resources.","error":"torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate XX GiB (GPU 0; XX GiB total capacity; YY GiB already allocated; ZZ GiB free; AA MiB reserved in total by PyTorch)"},{"fix":"Run `python -m spacy download your_model_name` to update to the latest compatible version of the pre-trained model. If it's a custom trained model, consider retraining it with the current `spacy-transformers` version or downgrade `spacy-transformers` to the version it was trained with. Always run `python -m spacy validate` after dependency changes.","cause":"You are attempting to load a `_trf` model (or a pipeline containing the transformer component) that was saved with an older version of `spacy-transformers` into an environment with a newer (potentially incompatible) version of `spacy-transformers` or `spaCy`.","error":"UserWarning: It looks like you're loading a transformer pipeline package trained with spacy-transformers vX.X.X after upgrading to spacy-transformers vY.Y.Y. This pipeline may be incompatible."},{"fix":"Ensure `spacy-transformers` is correctly installed in your environment: `pip install 'spacy[transformers]'`. Restart your Python environment or IDE if installation was recent. Verify `spacy validate` shows no issues.","cause":"The `spacy-transformers` library, which registers the 'transformer' factory, is either not installed or not correctly loaded by spaCy. This typically happens when trying to add the 'transformer' component to a pipeline without the necessary package.","error":"ValueError: [E002] Can't find 'transformer' in spaCy factories. Did you register it? If you're using a class, don't forget the `@Language.factory` decorator."}]}