Probable People
Probable People (version 0.5.6) is a Python library for parsing romanized names and company names using advanced Natural Language Processing (NLP) methods. Developed by DataMade, it focuses on segmenting and labeling components of person and company strings into standardized fields. Releases are infrequent but it is actively maintained.
Common errors
-
ModuleNotFoundError: No module named 'probablepeople'
cause The `probablepeople` library is not installed in the current Python environment.fixRun `pip install probablepeople` to install the library. -
TypeError: expected string or bytes-like object
cause The parsing functions `parse_person` or `parse_company` were called with an input that is not a string (e.g., None, int, list).fixEnsure that the input argument to parsing functions is always a string. Handle non-string inputs by converting them (e.g., `str(value)`) or skipping them.
Warnings
- gotcha Parsing models may not be 100% accurate, especially with highly ambiguous, culturally specific, or non-romanized names/company structures.
- gotcha Processing large datasets string-by-string can be computationally intensive and slow, as each parsing operation involves loading and running NLP models.
- gotcha Heavy dependencies like `scikit-learn` and `python-crfsuite` can lead to a larger installation footprint and potential version conflicts with other libraries in the same environment.
Install
-
pip install probablepeople
Imports
- parse_person
from probablepeople import parse_person
- parse_company
from probablepeople import parse_company
Quickstart
from probablepeople import parse_person, parse_company
# Example for parsing a person's name
name = "Mr. John A. Doe Jr."
parsed_name, name_type = parse_person(name)
print(f"Parsed Name: {parsed_name}\nName Type: {name_type}")
# Example for parsing a company name
company = "Google Inc."
parsed_company, company_type = parse_company(company)
print(f"Parsed Company: {parsed_company}\nCompany Type: {company_type}")