backrefs
backrefs is a Python library that extends the functionality of the standard `re` module and the third-party `regex` module by adding additional back references. It introduces features like `\c` for character class back-references, `\k<name>` for named capture groups that act as character classes, and enhanced atomic grouping. The library maintains an active development status, with regular minor releases addressing new features, bug fixes, and Python version compatibility.
Warnings
- breaking In version 6.0, the behavior of POSIX character classes (e.g., `[[:alnum:]]`, `[[:digit:]]`) was changed to always use POSIX compatibility rules instead of Unicode standard rules where applicable. This might break existing patterns that relied on the previous Unicode standard behavior for these classes.
- breaking Python 3.8 support was officially dropped in version 5.8. Users on Python 3.8 or older will need to upgrade their Python version or stay on backrefs < 5.8.
- gotcha backrefs provides two main interfaces: `bre` and `bregex`. `bre` wraps Python's built-in `re` module, while `bregex` wraps the third-party `regex` module. `bregex` offers more advanced regex features (inherited from `regex`) but requires `pip install regex`.
- gotcha A regression in version 6.0 created an ASCII binary property that would override an ASCII block property, leading to incorrect matching behavior in specific scenarios.
Install
-
pip install backrefs
Imports
- bre
from backrefs import bre
- bregex
from backrefs import bregex
Quickstart
import os
from backrefs import bre
# Example demonstrating character class back-references (\c)
# Standard 're' does not support directly referencing a captured group as a character class.
text = "apple banana banana orange"
# Pattern to match a word followed by a space and then the same word,
# using \c1 to reference the first captured group as a character class.
pattern_with_c = r'(\b\w+\b)\s\c1'
# Using backrefs.bre (which extends the standard 're' module)
match = bre.search(pattern_with_c, text)
if match:
print(f"Matched: '{match.group(0)}'")
print(f"First word captured: '{match.group(1)}'")
else:
print("No match found.")
# Another example: replacing duplicate consecutive words
text_dupe = "hello hello world world test"
pattern_dupe = r'(\b\w+\b)\s\c1'
# Replace "word word" with just "word"
result = bre.sub(pattern_dupe, r'\1', text_dupe)
print(f"Original: '{text_dupe}'")
print(f"After replacing duplicates: '{result}'")