Acryl DataHub Classify
raw JSON → 0.0.12 verified Wed Apr 15 auth: no python deprecated
This library, currently at version 0.0.12, was designed to predict info types for DataHub metadata. It is explicitly deprecated, with its last release (v0.0.12) marking it as such and signaling the end of active development and support. The project has had an infrequent release cadence.
pip install acryl-datahub-classify Common errors
error ModuleNotFoundError: No module named 'acryl_datahub_classify' ↓
cause The 'acryl-datahub-classify' package is not installed or not found in the Python environment.
fix
Install the package using pip: 'pip install acryl-datahub-classify'.
error ImportError: cannot import name 'predict_infotypes' from 'acryl_datahub_classify' ↓
cause The function 'predict_infotypes' is not available in the 'acryl_datahub_classify' module, possibly due to a version mismatch or incorrect import.
fix
Ensure you are using the correct import statement: 'from acryl_datahub_classify import predict_infotypes'.
error TypeError: predict_infotypes() missing 1 required positional argument: 'column_infos' ↓
cause The 'predict_infotypes' function was called without the necessary 'column_infos' argument.
fix
Provide the required 'column_infos' argument when calling 'predict_infotypes'.
error ValueError: Confidence level threshold must be between 0 and 1 ↓
cause An invalid confidence level threshold was provided to the 'predict_infotypes' function.
fix
Ensure the confidence level threshold is a float between 0 and 1.
error AttributeError: module 'acryl_datahub_classify' has no attribute 'predict_infotypes' ↓
cause The 'predict_infotypes' function is not defined in the 'acryl_datahub_classify' module, possibly due to an outdated version.
fix
Update the package to the latest version using pip: 'pip install --upgrade acryl-datahub-classify'.
Warnings
breaking This library (`acryl-datahub-classify`) is explicitly deprecated as of version 0.0.12 and will no longer receive updates or support from Acryl Data. It is recommended to migrate to alternative data classification solutions. ↓
fix Discontinue use of `acryl-datahub-classify`. Explore DataHub's native classification features if they meet your needs, or integrate with other third-party classification tools.
gotcha Due to its deprecated status, the pinned or range-based dependencies (e.g., `pandas<2.0.0`, `spacy>=3.0.0`) may quickly become outdated, leading to dependency conflicts with newer libraries in your environment or compatibility issues with newer Python versions. ↓
fix If continued use is unavoidable, carefully manage its dependencies in an isolated virtual environment to prevent conflicts. Be prepared for potential issues with newer Python interpreters or other libraries.
Imports
- DataHubClassifier
from datahub_classify.classifier.classifier import DataHubClassifier
Quickstart
import pandas as pd
from datahub_classify.classifier.classifier import DataHubClassifier
# Create a sample DataFrame
data = {
'email_address': ['test1@example.com', 'test2@example.com'],
'first_name': ['John', 'Jane'],
'social_security_number': ['XXX-XX-1234', 'XXX-XX-5678'],
'city': ['New York', 'Los Angeles']
}
df = pd.DataFrame(data)
# Initialize the classifier
classifier = DataHubClassifier()
# Classify the DataFrame
classified_df = classifier.classify_dataframe(df)
# Print results
print("Original DataFrame:")
print(df)
print("\nClassified DataFrame with info types:")
print(classified_df)
# Accessing inferred info types for a specific column
# For example, 'email_address'
print("\nInferred info types for 'email_address' column:")
if 'email_address' in classified_df.columns:
print(classified_df['email_address'].iloc[0].metadata.dataType.type.infoType)