{"id":6492,"library":"acryl-datahub-classify","title":"Acryl DataHub Classify","description":"This library, currently at version 0.0.12, was designed to predict info types for DataHub metadata. It is explicitly deprecated, with its last release (v0.0.12) marking it as such and signaling the end of active development and support. The project has had an infrequent release cadence.","status":"deprecated","version":"0.0.12","language":"en","source_language":"en","source_url":"https://github.com/acryldata/datahub-classify","tags":["datahub","classification","deprecated","data-governance","metadata"],"install":[{"cmd":"pip install acryl-datahub-classify","lang":"bash","label":"Install latest deprecated version"}],"dependencies":[{"reason":"Runtime dependency for data modeling.","package":"pydantic","optional":false},{"reason":"Core dependency for natural language processing and entity recognition.","package":"spacy","optional":false},{"reason":"Used for DataFrame manipulation and data processing.","package":"pandas","optional":false},{"reason":"Used for language detection of text data.","package":"langdetect","optional":false},{"reason":"Machine learning utilities and models.","package":"scikit-learn","optional":false},{"reason":"Scientific computing library, often a dependency of scikit-learn.","package":"scipy","optional":false},{"reason":"Used for parallel processing and caching.","package":"joblib","optional":false},{"reason":"Advanced regular expression operations.","package":"regex","optional":false},{"reason":"Creates data classes from dictionaries.","package":"dacite","optional":false}],"imports":[{"symbol":"DataHubClassifier","correct":"from datahub_classify.classifier.classifier import DataHubClassifier"}],"quickstart":{"code":"import pandas as pd\nfrom datahub_classify.classifier.classifier import DataHubClassifier\n\n# Create a sample DataFrame\ndata = {\n    'email_address': ['test1@example.com', 'test2@example.com'],\n    'first_name': ['John', 'Jane'],\n    'social_security_number': ['XXX-XX-1234', 'XXX-XX-5678'],\n    'city': ['New York', 'Los Angeles']\n}\ndf = pd.DataFrame(data)\n\n# Initialize the classifier\nclassifier = DataHubClassifier()\n\n# Classify the DataFrame\nclassified_df = classifier.classify_dataframe(df)\n\n# Print results\nprint(\"Original DataFrame:\")\nprint(df)\nprint(\"\\nClassified DataFrame with info types:\")\nprint(classified_df)\n\n# Accessing inferred info types for a specific column\n# For example, 'email_address'\nprint(\"\\nInferred info types for 'email_address' column:\")\nif 'email_address' in classified_df.columns:\n    print(classified_df['email_address'].iloc[0].metadata.dataType.type.infoType)","lang":"python","description":"This quickstart demonstrates how to initialize the DataHubClassifier and apply it to a pandas DataFrame to infer information types for columns. The output will show the original and classified DataFrames, including the detected info types for each cell."},"warnings":[{"fix":"Discontinue use of `acryl-datahub-classify`. Explore DataHub's native classification features if they meet your needs, or integrate with other third-party classification tools.","message":"This library (`acryl-datahub-classify`) is explicitly deprecated as of version 0.0.12 and will no longer receive updates or support from Acryl Data. It is recommended to migrate to alternative data classification solutions.","severity":"breaking","affected_versions":"0.0.12 and later"},{"fix":"If continued use is unavoidable, carefully manage its dependencies in an isolated virtual environment to prevent conflicts. Be prepared for potential issues with newer Python interpreters or other libraries.","message":"Due to its deprecated status, the pinned or range-based dependencies (e.g., `pandas<2.0.0`, `spacy>=3.0.0`) may quickly become outdated, leading to dependency conflicts with newer libraries in your environment or compatibility issues with newer Python versions.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[{"fix":"Install the package using pip: 'pip install acryl-datahub-classify'.","cause":"The 'acryl-datahub-classify' package is not installed or not found in the Python environment.","error":"ModuleNotFoundError: No module named 'acryl_datahub_classify'"},{"fix":"Ensure you are using the correct import statement: 'from acryl_datahub_classify import predict_infotypes'.","cause":"The function 'predict_infotypes' is not available in the 'acryl_datahub_classify' module, possibly due to a version mismatch or incorrect import.","error":"ImportError: cannot import name 'predict_infotypes' from 'acryl_datahub_classify'"},{"fix":"Provide the required 'column_infos' argument when calling 'predict_infotypes'.","cause":"The 'predict_infotypes' function was called without the necessary 'column_infos' argument.","error":"TypeError: predict_infotypes() missing 1 required positional argument: 'column_infos'"},{"fix":"Ensure the confidence level threshold is a float between 0 and 1.","cause":"An invalid confidence level threshold was provided to the 'predict_infotypes' function.","error":"ValueError: Confidence level threshold must be between 0 and 1"},{"fix":"Update the package to the latest version using pip: 'pip install --upgrade acryl-datahub-classify'.","cause":"The 'predict_infotypes' function is not defined in the 'acryl_datahub_classify' module, possibly due to an outdated version.","error":"AttributeError: module 'acryl_datahub_classify' has no attribute 'predict_infotypes'"}]}