Acryl DataHub Classify

0.0.12 · deprecated · verified Wed Apr 15

This library, currently at version 0.0.12, was designed to predict info types for DataHub metadata. It is explicitly deprecated, with its last release (v0.0.12) marking it as such and signaling the end of active development and support. The project has had an infrequent release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the DataHubClassifier and apply it to a pandas DataFrame to infer information types for columns. The output will show the original and classified DataFrames, including the detected info types for each cell.

import pandas as pd
from datahub_classify.classifier.classifier import DataHubClassifier

# Create a sample DataFrame
data = {
    'email_address': ['test1@example.com', 'test2@example.com'],
    'first_name': ['John', 'Jane'],
    'social_security_number': ['XXX-XX-1234', 'XXX-XX-5678'],
    'city': ['New York', 'Los Angeles']
}
df = pd.DataFrame(data)

# Initialize the classifier
classifier = DataHubClassifier()

# Classify the DataFrame
classified_df = classifier.classify_dataframe(df)

# Print results
print("Original DataFrame:")
print(df)
print("\nClassified DataFrame with info types:")
print(classified_df)

# Accessing inferred info types for a specific column
# For example, 'email_address'
print("\nInferred info types for 'email_address' column:")
if 'email_address' in classified_df.columns:
    print(classified_df['email_address'].iloc[0].metadata.dataType.type.infoType)

view raw JSON →