USaddress
USaddress is a Python library designed for parsing unstructured United States address strings into their individual components, employing advanced Natural Language Processing (NLP) methods. It utilizes a probabilistic model, specifically Conditional Random Fields, to make educated guesses in identifying address parts, even in complex cases. The library's current version is 0.5.16, and it is actively maintained.
Warnings
- gotcha The `usaddress.tag()` method may raise a `RepeatedLabelError` if an address string contains multiple components that are assigned the same label and cannot be logically concatenated into a single value (e.g., two distinct 'StreetName' labels without an 'IntersectionSeparator'). This often indicates an ambiguous or improperly formatted input address.
- gotcha usaddress is designed specifically for 'United States address strings'. Attempting to parse international addresses will likely result in incorrect component labeling or parsing failures. The underlying model is trained on US address patterns.
- gotcha The library uses a probabilistic model to identify address components, but it 'cannot identify address components with perfect accuracy, nor can it verify that a given address is correct/valid.' It provides structured components but does not perform address validation (e.g., checking if an address is mailable or exists).
Install
-
pip install usaddress
Imports
- usaddress
import usaddress
Quickstart
import usaddress
address_string = "123 Main St. Suite 100 Chicago, IL 60601"
# The .parse() method returns a list of (value, label) tuples
parsed_address = usaddress.parse(address_string)
print("Parsed (tuples):")
print(parsed_address)
# The .tag() method returns an OrderedDict of components and an address type
tagged_address, address_type = usaddress.tag(address_string)
print("\nTagged (OrderedDict & Type):")
print(tagged_address)
print(f"Address Type: {address_type}")