USaddress

0.5.16 · active · verified Thu Apr 09

USaddress is a Python library designed for parsing unstructured United States address strings into their individual components, employing advanced Natural Language Processing (NLP) methods. It utilizes a probabilistic model, specifically Conditional Random Fields, to make educated guesses in identifying address parts, even in complex cases. The library's current version is 0.5.16, and it is actively maintained.

Warnings

gotcha The `usaddress.tag()` method may raise a `RepeatedLabelError` if an address string contains multiple components that are assigned the same label and cannot be logically concatenated into a single value (e.g., two distinct 'StreetName' labels without an 'IntersectionSeparator'). This often indicates an ambiguous or improperly formatted input address.
Fix: Catch `RepeatedLabelError` when using `tag()` and consider using `parse()` for more granular (though less aggregated) output, or preprocess ambiguous input strings.
gotcha usaddress is designed specifically for 'United States address strings'. Attempting to parse international addresses will likely result in incorrect component labeling or parsing failures. The underlying model is trained on US address patterns.
Fix: Ensure input addresses are strictly within the United States. For international address parsing, use a library specifically designed for global addresses (e.g., libpostal Python bindings).
gotcha The library uses a probabilistic model to identify address components, but it 'cannot identify address components with perfect accuracy, nor can it verify that a given address is correct/valid.' It provides structured components but does not perform address validation (e.g., checking if an address is mailable or exists).
Fix: Do not rely on usaddress for address validation. For validation, integrate with a dedicated address verification service (e.g., USPS API, SmartyStreets, etc.) after parsing with usaddress.

Install

pip install usaddress Install stable version

Imports

usaddress
```
import usaddress
```
The library is imported directly as 'usaddress'.

Quickstart

This example demonstrates both the `parse()` method, which returns a list of (value, label) tuples, and the `tag()` method, which returns a more structured `OrderedDict` of components and an inferred address type.

import usaddress

address_string = "123 Main St. Suite 100 Chicago, IL 60601"

# The .parse() method returns a list of (value, label) tuples
parsed_address = usaddress.parse(address_string)
print("Parsed (tuples):")
print(parsed_address)

# The .tag() method returns an OrderedDict of components and an address type
tagged_address, address_type = usaddress.tag(address_string)
print("\nTagged (OrderedDict & Type):")
print(tagged_address)
print(f"Address Type: {address_type}")

view raw JSON →