spaCy Legacy
spacy-legacy is a Python package that provides outdated registered functions and architectures for spaCy v3.x, ensuring backwards compatibility for projects that rely on older component implementations. It is automatically installed as a dependency of the main spaCy library. The package's releases are typically tied to major spaCy updates where core components undergo backwards-incompatible changes, allowing older configurations and trained models to continue functioning. The current version is 3.0.12, released in January 2023.
Warnings
- gotcha spacy-legacy is designed for backward compatibility, not for new development. Avoid explicitly using `spacy-legacy` prefixed functions (e.g., `spacy-legacy.Tok2Vec.v1`) in new configurations, unless you specifically need an older, exact behavior. Always prefer the latest `spaCy` core implementations (e.g., `spacy.Tok2Vec.v2` or later) for optimal performance and features.
- breaking Older spaCy models or configurations explicitly referencing deprecated architectures (like `TextCatBOW.v1` or `MaxoutWindowEncoder.v1`) might behave differently or have different input/output types when loaded with newer spaCy versions. While `spacy-legacy` provides the old implementation, the signature or expected input/output might have changed in core spaCy, leading to runtime errors if not adapted.
- gotcha spaCy versions 3.0 and newer may issue warnings when loading pipeline packages trained with earlier spaCy v3.x versions. This is a general compatibility warning, even if `spacy-legacy` handles some underlying component differences, and indicates potential subtle incompatibilities.
- gotcha The `spacy.StaticVectors.v1` architecture, available via `spacy-legacy`, contained a bug where tokens without vectors were mapped to the final row in the vectors table. This could cause model predictions to change unexpectedly if new vectors were added to an existing table.
Install
-
pip install spacy-legacy
Imports
- spacy.registry.architectures
from spacy import registry # Functions are typically resolved via spaCy's registry, not direct import from spacy_legacy
Quickstart
import spacy
import os
from spacy.util import load_config_from_str
# This config uses 'spacy.Tok2Vec.v1', an architecture moved to spacy-legacy
config_content = """
[paths]
vocab = null # Essential if not using pre-trained vectors, otherwise spacy might expect them
[nlp]
lang = "en"
pipeline = ["tok2vec"]
[components]
[components.tok2vec]
factory = "tok2vec"
[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v1"
width = 96
embed_size = 2000
"""
# Save config to a temporary file
config_path = "temp_legacy_config.cfg"
with open(config_path, "w") as f:
f.write(config_content)
try:
# spaCy will automatically resolve 'spacy.Tok2Vec.v1' to spacy-legacy's implementation
print(f"Attempting to load pipeline using config from {config_path}...")
nlp = spacy.load(config_path)
print("Pipeline loaded successfully, leveraging spacy-legacy for 'Tok2Vec.v1'.")
doc = nlp("This is a demonstration of spacy-legacy in action.")
print(f"Processed text: {doc.text}")
print(f"Number of tokens: {len(doc)}")
except Exception as e:
print(f"An error occurred while loading the pipeline: {e}")
print("Ensure spacy and spacy-legacy are installed. If spaCy's core architecture")
print("for Tok2Vec.v1 has completely changed its signature, this example might need adjustment.")
finally:
if os.path.exists(config_path):
os.remove(config_path)