Cleanco
Cleanco is a Python library (current version 2.3) designed to process company names. It cleans names by stripping away terms indicating organization type (like 'Ltd.' or 'Corp.'), deduces the business entity type (e.g., 'limited liability company'), and suggests possible countries of establishment. Releases are somewhat irregular, with recent updates in late 2023 and early 2024.
Warnings
- breaking The old class-based API, such as `cleanco.cleanco(name).clean_name()`, was entirely removed in version 2.2.
- breaking The function `prepare_terms()` was renamed to `prepare_default_terms()` in version 2.2.
- breaking Cleanco moved to be Python 3 only from version 2.0.1.
- gotcha From version 2.2 onwards, terms no longer need to be explicitly passed to `basename()`, simplifying its usage. If you need custom terms, use `custom_basename()` instead.
- gotcha For company names with multiple suffixes, you might need to run `basename()` more than once to ensure all suffixes are stripped, as the internal term data might cover these iteratively.
Install
-
pip install cleanco
Imports
- basename
from cleanco import basename
- custom_basename
from cleanco import custom_basename
- prepare_default_terms
from cleanco import prepare_default_terms
Quickstart
from cleanco import basename, typesources, matches, countrysources
business_name = "Some Big Pharma, LLC"
cleaned_name = basename(business_name)
print(f"Cleaned name: {cleaned_name}")
classification_sources = typesources()
business_types = matches(business_name, classification_sources)
print(f"Business types: {business_types}")
country_classification_sources = countrysources()
possible_countries = matches(business_name, country_classification_sources)
print(f"Possible countries: {possible_countries}")