Acryl DataHub Classify
This library, currently at version 0.0.12, was designed to predict info types for DataHub metadata. It is explicitly deprecated, with its last release (v0.0.12) marking it as such and signaling the end of active development and support. The project has had an infrequent release cadence.
Common errors
-
ModuleNotFoundError: No module named 'acryl_datahub_classify'
cause The 'acryl-datahub-classify' package is not installed or not found in the Python environment.fixInstall the package using pip: 'pip install acryl-datahub-classify'. -
ImportError: cannot import name 'predict_infotypes' from 'acryl_datahub_classify'
cause The function 'predict_infotypes' is not available in the 'acryl_datahub_classify' module, possibly due to a version mismatch or incorrect import.fixEnsure you are using the correct import statement: 'from acryl_datahub_classify import predict_infotypes'. -
TypeError: predict_infotypes() missing 1 required positional argument: 'column_infos'
cause The 'predict_infotypes' function was called without the necessary 'column_infos' argument.fixProvide the required 'column_infos' argument when calling 'predict_infotypes'. -
ValueError: Confidence level threshold must be between 0 and 1
cause An invalid confidence level threshold was provided to the 'predict_infotypes' function.fixEnsure the confidence level threshold is a float between 0 and 1. -
AttributeError: module 'acryl_datahub_classify' has no attribute 'predict_infotypes'
cause The 'predict_infotypes' function is not defined in the 'acryl_datahub_classify' module, possibly due to an outdated version.fixUpdate the package to the latest version using pip: 'pip install --upgrade acryl-datahub-classify'.
Warnings
- breaking This library (`acryl-datahub-classify`) is explicitly deprecated as of version 0.0.12 and will no longer receive updates or support from Acryl Data. It is recommended to migrate to alternative data classification solutions.
- gotcha Due to its deprecated status, the pinned or range-based dependencies (e.g., `pandas<2.0.0`, `spacy>=3.0.0`) may quickly become outdated, leading to dependency conflicts with newer libraries in your environment or compatibility issues with newer Python versions.
Install
-
pip install acryl-datahub-classify
Imports
- DataHubClassifier
from datahub_classify.classifier.classifier import DataHubClassifier
Quickstart
import pandas as pd
from datahub_classify.classifier.classifier import DataHubClassifier
# Create a sample DataFrame
data = {
'email_address': ['test1@example.com', 'test2@example.com'],
'first_name': ['John', 'Jane'],
'social_security_number': ['XXX-XX-1234', 'XXX-XX-5678'],
'city': ['New York', 'Los Angeles']
}
df = pd.DataFrame(data)
# Initialize the classifier
classifier = DataHubClassifier()
# Classify the DataFrame
classified_df = classifier.classify_dataframe(df)
# Print results
print("Original DataFrame:")
print(df)
print("\nClassified DataFrame with info types:")
print(classified_df)
# Accessing inferred info types for a specific column
# For example, 'email_address'
print("\nInferred info types for 'email_address' column:")
if 'email_address' in classified_df.columns:
print(classified_df['email_address'].iloc[0].metadata.dataType.type.infoType)