spaCy Curated Transformers

2.1.2 · active · verified Sun Apr 12

spacy-curated-transformers provides efficient and curated transformer models designed for integration into spaCy processing pipelines. It wraps the `curated-transformers` library, offering specialized components and utilities for tasks like wordpiece tokenization and transformer-based embeddings within spaCy's `Doc` and `Span` objects. The library is actively maintained by Explosion, with a focus on compatibility with latest spaCy and Thinc versions, and releases often align with improvements in underlying transformer architectures.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use a spaCy model that internally leverages `spacy-curated-transformers` to process text and access the transformer's output data. Users typically interact with the library through a pre-trained spaCy transformer pipeline.

import spacy

# To use spacy-curated-transformers, you typically load a spaCy model
# that includes a 'transformer' component.
# First, ensure you have a compatible model downloaded:
# python -m spacy download en_core_web_trf

try:
    # Load a pre-trained spaCy model that utilizes spacy-curated-transformers
    nlp = spacy.load("en_core_web_trf")
    doc = nlp("Hello, world! This is a test sentence.")

    print(f"Processed doc with {len(doc)} tokens.")

    # Access transformer data via the custom Doc extension
    if doc._.has_extension("trf_data") and doc._.trf_data is not None:
        # `trf_data` contains tensors, alignment information, etc.
        # For example, the transformer output for the first token:
        print(f"Transformer output for token 0 shape: {doc._.trf_data.tensors[0].shape}")
        print(f"Transformer output for token 1 shape: {doc._.trf_data.tensors[1].shape}")
    else:
        print("No transformer data found. Ensure a transformer pipe is in the pipeline.")

except Exception as e:
    print(f"Error loading or processing model: {e}")
    print("Please ensure 'en_core_web_trf' is downloaded using: python -m spacy download en_core_web_trf")

view raw JSON →