CommonRegex
CommonRegex is a Python library that bundles a collection of commonly used regular expressions with a straightforward API. It simplifies the extraction of various patterns like dates, times, emails, phone numbers, links, IP addresses, prices, and street addresses from text strings. The current version is 1.5.4, released in 2014, indicating a maintenance-only cadence without active development.
Common errors
-
AttributeError: 'CommonRegex' object has no attribute 'dates()'
cause When `CommonRegex` is initialized with text (e.g., `parsed_text = CommonRegex(text)`), the results for that specific text are exposed as attributes (e.g., `parsed_text.dates`), not methods. Users incorrectly try to call them as methods.fixAccess the extracted data as an attribute, e.g., `parsed_text.dates`. If you want to process *new* text with an existing `CommonRegex` instance, you would use methods like `parser.dates('new text here')`. -
ImportError: cannot import name 'commonregex' from 'commonregex' (unknown location)
cause Users attempting to import the `commonregex` module itself as a class or specific function, similar to `from some_lib import some_lib`.fixTo import the main parsing class, use `from commonregex import CommonRegex`. To import individual regex patterns, use `from commonregex import date` (or `email`, `time`, etc.). -
No matches found or incorrect matches for non-English text.
cause The regular expressions within `commonregex` are designed specifically for English/US linguistic and formatting conventions. They will fail or produce unexpected results when applied to text in other languages or different regional formats.fixThis library is not suitable for non-English or non-US specific content without significant modification of its internal regex patterns. Consider using a library designed for internationalization or writing custom regexes for your target language/locale.
Warnings
- gotcha The module is explicitly noted as "English/US specific". Its regular expressions are tailored for English-language patterns and US formats (e.g., phone numbers, dates), and may not work as expected for other languages or regional formats.
- gotcha Performance may be a concern for large texts or high-volume processing. Community-maintained forks like 'commonregex-improved' highlight that the original library's API calls can be slow due to how regular expressions are compiled and executed internally.
- gotcha Some individual regex patterns, when used directly with `re.findall`, might produce partial matches or require additional surrounding conditions for strict validation (e.g., `ip` regex matching parts of other numbers).
Install
-
pip install commonregex
Imports
- CommonRegex
import commonregex
from commonregex import CommonRegex
- date (or other individual regex patterns)
from commonregex.regexes import date
from commonregex import date
Quickstart
from commonregex import CommonRegex
text = """John, please get that article on www.linkedin.com to me by 5:00PM on Jan 9th 2012.
4:00 would be ideal, actually. If you have any questions, You can reach me at (519)-236-2723x341
or get in touch with my associate at harold.smith@gmail.com. Check my IP 192.168.1.1."""
# Instantiate CommonRegex with the text
parsed_text = CommonRegex(text)
# Access extracted data via attributes
print(f"Dates: {parsed_text.dates}")
print(f"Times: {parsed_text.times}")
print(f"Links: {parsed_text.links}")
print(f"Phones with Exts: {parsed_text.phones_with_exts}")
print(f"Emails: {parsed_text.emails}")
print(f"IPs: {parsed_text.ips}")
# Alternatively, use a single instance for multiple texts
parser = CommonRegex()
print(f"Later time: {parser.times('Meet me at 7:30 AM.')}")