{"id":8663,"library":"spacy-alignments","title":"spaCy Alignments","description":"spacy-alignments is a Python library that provides efficient tokenization alignment capabilities, particularly useful for integrating different NLP tools like spaCy and transformer models. It offers Python bindings for Yohei Tamura's highly performant Rust `tokenizations` library. The current version is 0.9.2, with releases primarily focused on supporting new Python versions and underlying PyO3 updates.","status":"active","version":"0.9.2","language":"en","source_language":"en","source_url":"https://github.com/explosion/spacy-alignments","tags":["spacy","nlp","tokenization","alignment","rust-bindings","transformers"],"install":[{"cmd":"pip install spacy-alignments","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Designed to align tokenizations for spaCy and transformers, implicitly requiring spaCy for most use cases.","package":"spacy","optional":true},{"reason":"Required to build the package from source if no pre-compiled binary wheels are available for your platform and Python version.","package":"rust","optional":true}],"imports":[{"symbol":"get_alignments","correct":"import spacy_alignments as tokenizations\na2b, b2a = tokenizations.get_alignments(tokens_a, tokens_b)"}],"quickstart":{"code":"import spacy_alignments as tokenizations\n\n# Example from spacy-alignments README/PyPI\ntokens_a = [\"å\", \"BC\"]\ntokens_b = [\"abc\"] # the accent is dropped (å -> a) and the letters are lowercased(BC -> bc)\n\n# Get alignment mappings for two different tokenizations\na2b, b2a = tokenizations.get_alignments(tokens_a, tokens_b)\n\nprint(f\"Alignment from tokens_a to tokens_b: {a2b}\")\nprint(f\"Alignment from tokens_b to tokens_a: {b2a}\")","lang":"python","description":"The `get_alignments` function is the core of the library, providing bidirectional mapping between two sequences of tokens that may have undergone different tokenization or normalization processes."},"warnings":[{"fix":"Upgrade to Python 3.9 or newer. The current version (0.9.2) supports Python >=3.9, <3.14.","message":"Version 0.9.0 dropped support for Python 3.6.","severity":"breaking","affected_versions":">=0.9.0"},{"fix":"If `pip install spacy-alignments` fails with build errors, install Rust by following instructions on `rustup.rs` and ensure `cargo` is in your system's PATH.","message":"Installation may require Rust compiler if pre-built binary wheels are not available for your platform and Python version.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be aware of potential subtle changes in token alignments when upgrading `spacy-transformers`. Verify your downstream tasks if they rely on specific alignment behavior. This primarily impacts transformer pipelines within spaCy rather than direct `spacy-alignments` usage.","message":"When using `spacy-transformers` v1.2 or newer, the alignment between spaCy tokens and transformer tokens for 'fast tokenizers' may differ from previous versions. This is because `spacy-transformers` now uses exact alignment from the underlying tokenizers instead of `spacy-alignments`'s heuristic method.","severity":"gotcha","affected_versions":"spacy-transformers >=1.2"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Inspect your training data's character offsets and original text carefully. Use the suggested `spacy.training.offsets_to_biluo_tags` (for spaCy v3+) or `spacy.gold.biluo_tags_from_offsets` (for spaCy v2.3+) to debug specific misalignments. Adjust entity start/end indices to precisely match token boundaries, or modify spaCy's tokenizer exceptions for better alignment.","cause":"Character-based entity offsets in training data do not perfectly map to the token boundaries generated by spaCy's tokenizer, which is a common issue when combining different tokenization strategies or handling noisy text.","error":"UserWarning: [W030] Some entities could not be aligned in the text \"...\" with entities \"[...]\". Use `spacy.training.offsets_to_biluo_tags(nlp.make_doc(text), entities)` to check the alignment. Misaligned entities ('-') will be ignored during training."},{"fix":"Install the package using `pip install spacy-alignments`. If using virtual environments, ensure the correct environment is activated.","cause":"The `spacy-alignments` package is not installed or not accessible in the current Python environment.","error":"ModuleNotFoundError: No module named 'spacy_alignments'"},{"fix":"Install Rust by following the instructions at `https://rustup.rs/`. Ensure that the `cargo` command is available in your system's PATH. You may also try upgrading `pip` and `setuptools` (`pip install -U pip setuptools`) before retrying the installation.","cause":"This error, or similar Rust compilation errors (e.g., `command 'rustc' not found`), indicates that a pre-built binary wheel for `spacy-alignments` was not available for your system, and the build process failed because the Rust toolchain is missing or misconfigured.","error":"error: can't find crate `tokenizations`"}]}