US Address Scourgify
usaddress-scourgify is a Python library (current version 0.6.0) for cleaning and normalizing US addresses, adhering to USPS Publication 28 and RESO guidelines. Released on December 14, 2023, it provides functions to standardize address strings into a consistent format. The library is built on top of the `usaddress` parsing library. While active, its release cadence appears infrequent.
Warnings
- gotcha usaddress-scourgify focuses solely on cleaning and normalization; it does not perform address validation (e.g., checking if an address actually exists).
- gotcha The `get_geocoder_normalized_addr` function relies on `geocoder.google` and requires the `GOOGLE_API_KEY` environment variable to be set. This function performs no additional internal cleaning, so addresses with stray or non-conforming elements may result in no output.
- gotcha By default, the library abbreviates pre/post directionals, street types, and occupancy types (e.g., 'SW Main St' instead of 'Southwest Main Street').
- breaking Custom address constants can be defined using a YAML file specified by the `ADDRESS_CONFIG_DIR` environment variable. Changes to the internal constant structure in new library versions may break or alter the behavior of custom configurations.
- gotcha Parsing of PO Box addresses was problematic in versions prior to 0.6.0 due to how ambiguous labels were handled. While resolved in v0.6.0, complex PO Box formats might still require careful input formatting.
Install
-
pip install usaddress-scourgify
Imports
- normalize_address_record
from scourgify import normalize_address_record
- NormalizeAddress
from scourgify import NormalizeAddress
- get_geocoder_normalized_addr
from scourgify.normalize import get_geocoder_normalized_addr
Quickstart
import os
from scourgify import normalize_address_record
address_str = '123 southwest Main street, Boring, OR 97009, UNIT 100'
# Normalize an address string
cleaned_address = normalize_address_record(address_str)
print(cleaned_address)
# To get long-hand output (e.g., 'Southwest' instead of 'SW')
long_hand_address = normalize_address_record(address_str, long_hand=True)
print(long_hand_address)
# Example with get_geocoder_normalized_addr (requires GOOGLE_API_KEY env var)
# from scourgify.normalize import get_geocoder_normalized_addr
# os.environ['GOOGLE_API_KEY'] = os.environ.get('GOOGLE_API_KEY', 'YOUR_GOOGLE_API_KEY')
# geocoded_address = get_geocoder_normalized_addr(address_str)
# print(geocoded_address)