spaCy Transformers: Integrate Hugging Face Models

1.4.0 · active · verified Thu Apr 16

The `spacy-transformers` library provides spaCy components and architectures to seamlessly integrate pre-trained transformer models from Hugging Face's `transformers` library into spaCy pipelines. It enables convenient access to state-of-the-art architectures like BERT, GPT-2, and XLNet for various NLP tasks, leveraging spaCy v3's powerful and extensible configuration system for multi-task learning. The current version is 1.4.0, and releases are generally aligned with spaCy's major version updates and `transformers` library advancements.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates loading a pre-trained, transformer-backed spaCy model (like `en_core_web_trf`) and processing text to extract entities, showcasing the integration. It also briefly touches on accessing the transformer's vector outputs, which power subsequent spaCy components.

import spacy

# Ensure you have downloaded a transformer-backed model, e.g., using:
# python -m spacy download en_core_web_trf

nlp = spacy.load("en_core_web_trf")
text = "Apple is acquiring a London-based AI startup for $200M."
doc = nlp(text)

print(f"Text: {text}")
print(f"Entities: {[(ent.text, ent.label_) for ent in doc.ents]}")

# Accessing transformer output (e.g., pooled vector for the doc)
# Note: Raw transformer outputs are typically stored in doc._.trf_data or doc.tensor
if doc.has_annotation("SENT_START"): # Check if sentencizer is in pipeline
    print(f"Document vector (first token of first sentence): {doc[0].vector[:5]}") # First 5 elements of vector

view raw JSON →