{"id":4767,"library":"spacy-curated-transformers","title":"spaCy Curated Transformers","description":"spacy-curated-transformers provides efficient and curated transformer models designed for integration into spaCy processing pipelines. It wraps the `curated-transformers` library, offering specialized components and utilities for tasks like wordpiece tokenization and transformer-based embeddings within spaCy's `Doc` and `Span` objects. The library is actively maintained by Explosion, with a focus on compatibility with latest spaCy and Thinc versions, and releases often align with improvements in underlying transformer architectures.","status":"active","version":"2.1.2","language":"en","source_language":"en","source_url":"https://github.com/explosion/spacy-curated-transformers","tags":["spacy","nlp","transformers","thinc","machine learning","embeddings"],"install":[{"cmd":"pip install spacy-curated-transformers","lang":"bash","label":"Install package"}],"dependencies":[{"reason":"Core NLP framework dependency; while not a direct PyPI dependency of spacy-curated-transformers itself for flexibility, it is a functional requirement to use the library.","package":"spacy","optional":false},{"reason":"Deep learning library that powers spaCy and this library.","package":"thinc","optional":false},{"reason":"Provides the underlying transformer model implementations.","package":"curated-transformers","optional":false}],"imports":[{"note":"The pipeline component class is located under `pipeline`, not `components`.","wrong":"from spacy_curated_transformers.components import CuratedTransformer","symbol":"CuratedTransformer","correct":"from spacy_curated_transformers.pipeline import CuratedTransformer"},{"note":"While `DocTransformerOutput` exists, it's almost always accessed via `doc._.trf_data` after processing a Doc with a transformer pipe, rather than imported directly.","wrong":"from spacy_curated_transformers.data_classes import DocTransformerOutput","symbol":"DocTransformerOutput","correct":"doc._.trf_data"}],"quickstart":{"code":"import spacy\n\n# To use spacy-curated-transformers, you typically load a spaCy model\n# that includes a 'transformer' component.\n# First, ensure you have a compatible model downloaded:\n# python -m spacy download en_core_web_trf\n\ntry:\n    # Load a pre-trained spaCy model that utilizes spacy-curated-transformers\n    nlp = spacy.load(\"en_core_web_trf\")\n    doc = nlp(\"Hello, world! This is a test sentence.\")\n\n    print(f\"Processed doc with {len(doc)} tokens.\")\n\n    # Access transformer data via the custom Doc extension\n    if doc._.has_extension(\"trf_data\") and doc._.trf_data is not None:\n        # `trf_data` contains tensors, alignment information, etc.\n        # For example, the transformer output for the first token:\n        print(f\"Transformer output for token 0 shape: {doc._.trf_data.tensors[0].shape}\")\n        print(f\"Transformer output for token 1 shape: {doc._.trf_data.tensors[1].shape}\")\n    else:\n        print(\"No transformer data found. Ensure a transformer pipe is in the pipeline.\")\n\nexcept Exception as e:\n    print(f\"Error loading or processing model: {e}\")\n    print(\"Please ensure 'en_core_web_trf' is downloaded using: python -m spacy download en_core_web_trf\")","lang":"python","description":"This quickstart demonstrates how to use a spaCy model that internally leverages `spacy-curated-transformers` to process text and access the transformer's output data. Users typically interact with the library through a pre-trained spaCy transformer pipeline."},"warnings":[{"fix":"Update your spaCy config or `nlp.add_pipe()` calls to use the factory name `curated_transformer` (or import `CuratedTransformer` from `spacy_curated_transformers.pipeline`).","message":"The main transformer pipe component was renamed from its original name to `CuratedTransformer` in `v0.2.0`. If you were manually adding the pipe, you must update the component name in your configuration.","severity":"breaking","affected_versions":">=0.2.0"},{"fix":"Adjust custom code to handle `(0, n)` shape for whitespace token transformer outputs, or use spaCy's built-in pooling operations which are designed to correctly handle this.","message":"Handling of whitespace tokens changed in `v0.3.1`. When accessing `doc._.trf_data[i]` for a whitespace token, the resulting array now has a shape of `(0, n)` (where `n` is the output dimension) instead of a zeroed row. This might affect custom processing logic that assumes a fixed output shape for all tokens.","severity":"breaking","affected_versions":">=0.3.1"},{"fix":"Review release notes for `curated-transformers` 2.0 and `spacy-curated-transformers` 2.0.0. Ensure compatibility with your existing configurations and custom components.","message":"Version `2.0.0` rebased on `curated-transformers` 2.0. This brought significant internal changes and new features (like discriminative learning rates). Direct interaction with underlying `curated-transformers` objects via `spacy-curated-transformers` might require adjustments.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Remove any code or configuration related to quantization. Monitor future releases for its re-introduction once the API is stable.","message":"Quantization support was explicitly removed in `v0.2.0` until the serialization API for it could be stabilized. If your workflow relied on this feature, it's no longer available.","severity":"deprecated","affected_versions":">=0.2.0"},{"fix":"Always install `spacy` alongside `spacy-curated-transformers` from scratch in a clean environment, or use `pip install -U spacy-curated-transformers` and then `pip install -U spacy` (or vice-versa) to let pip resolve dependencies. Check `pyproject.toml` or `setup.py` files for specific version requirements for crucial dependencies like `thinc` and `curated-transformers`.","message":"Dependency management across `spacy`, `thinc`, `curated-transformers`, and `numpy` can be complex. Recent releases (e.g., `v2.1.1`, `v2.1.2`, `v0.3.0`) highlight efforts to relax pins and avoid direct `spaCy` dependency to enhance model forward compatibility. However, users must ensure compatible versions are installed to avoid runtime errors (e.g., Thinc 9.1.0 for NumPy v2 compatibility).","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}