backrefs

6.2 · active · verified Thu Apr 09

backrefs is a Python library that extends the functionality of the standard `re` module and the third-party `regex` module by adding additional back references. It introduces features like `\c` for character class back-references, `\k<name>` for named capture groups that act as character classes, and enhanced atomic grouping. The library maintains an active development status, with regular minor releases addressing new features, bug fixes, and Python version compatibility.

Warnings

breaking In version 6.0, the behavior of POSIX character classes (e.g., `[[:alnum:]]`, `[[:digit:]]`) was changed to always use POSIX compatibility rules instead of Unicode standard rules where applicable. This might break existing patterns that relied on the previous Unicode standard behavior for these classes.
Fix: To explicitly use standard Unicode rules for affected properties, use their Unicode property form instead (e.g., `[\p{Alnum}]` instead of `[[:alnum:]]`). Review and test existing regex patterns carefully after upgrading.
breaking Python 3.8 support was officially dropped in version 5.8. Users on Python 3.8 or older will need to upgrade their Python version or stay on backrefs < 5.8.
Fix: Upgrade your Python environment to 3.9 or newer to use backrefs 5.8+.
gotcha backrefs provides two main interfaces: `bre` and `bregex`. `bre` wraps Python's built-in `re` module, while `bregex` wraps the third-party `regex` module. `bregex` offers more advanced regex features (inherited from `regex`) but requires `pip install regex`.
Fix: Choose the appropriate module (`bre` or `bregex`) based on your needs and installed dependencies. If you need features from the `regex` library (e.g., fuzzy matching, recursion), use `bregex` and ensure `regex` is installed.
gotcha A regression in version 6.0 created an ASCII binary property that would override an ASCII block property, leading to incorrect matching behavior in specific scenarios.
Fix: This issue was fixed in version 6.0.1. Users on 6.0 should upgrade to 6.0.1 or newer to avoid this specific regression.

Install

pip install backrefs Install stable version

Imports

bre
```
from backrefs import bre
```
While 'import backrefs.bre as bre' works, the direct 'from backrefs import bre' is idiomatic and commonly used in examples.
bregex
```
from backrefs import bregex
```
Similar to `bre`, direct import is idiomatic. Remember that `bregex` requires the `regex` package to be installed.

Quickstart

This quickstart demonstrates how to use `backrefs.bre` to leverage character class back-references (e.g., `\c1` or `\k<name>`), a powerful feature not available in Python's standard `re` module. The example shows searching for repeating words and replacing them.

import os
from backrefs import bre

# Example demonstrating character class back-references (\c)
# Standard 're' does not support directly referencing a captured group as a character class.

text = "apple banana banana orange"
# Pattern to match a word followed by a space and then the same word,
# using \c1 to reference the first captured group as a character class.
pattern_with_c = r'(\b\w+\b)\s\c1'

# Using backrefs.bre (which extends the standard 're' module)
match = bre.search(pattern_with_c, text)

if match:
    print(f"Matched: '{match.group(0)}'")
    print(f"First word captured: '{match.group(1)}'")
else:
    print("No match found.")

# Another example: replacing duplicate consecutive words
text_dupe = "hello hello world world test"
pattern_dupe = r'(\b\w+\b)\s\c1'
# Replace "word word" with just "word"
result = bre.sub(pattern_dupe, r'\1', text_dupe)
print(f"Original: '{text_dupe}'")
print(f"After replacing duplicates: '{result}'")

view raw JSON →