{"id":6908,"library":"tdda","title":"TDDA: Test-Driven Data Analysis","description":"TDDA (Test-Driven Data Analysis) is a Python library and set of command-line tools designed to improve the correctness and robustness of data analysis. It provides features for reference testing of data pipelines, automatic discovery and verification of data constraints, anomaly detection, and inference of regular expressions from text data (Rexpy). Additionally, from version 2.0, it includes features for automatic test generation (Gentest) for command-line programs. It currently supports Python >=3.8 and is actively maintained, with version 2.2.17 being the latest stable release.","status":"active","version":"2.2.17","language":"en","source_language":"en","source_url":"https://github.com/tdda/tdda","tags":["data quality","data validation","test-driven development","data testing","constraints","regular expressions","data analysis","pipeline testing"],"install":[{"cmd":"pip install tdda","lang":"bash","label":"Install stable version"}],"dependencies":[{"cmd":"pip install pandas","reason":"Required for CSV files and feather files, and for working with DataFrames in constraint discovery/verification.","package":"pandas","optional":true},{"cmd":"pip install feather-format","reason":"Required for reading and writing Feather files.","package":"feather-format","optional":true},{"cmd":"pip install pmmif","reason":"Makes feather file reading and writing more robust.","package":"pmmif","optional":true},{"cmd":"pip install pygresql","reason":"Required for PostgreSQL database tables.","package":"pygresql","optional":true},{"cmd":"pip install mysqlclient (or other, choose one)","reason":"One of these is required for MySQL database tables.","package":"mysqlclient / MySQL-python / mysql-connector-python","optional":true},{"cmd":"pip install pymongo","reason":"Required for MongoDB document collections.","package":"pymongo","optional":true}],"imports":[{"note":"Used for automatically inferring constraints from a Pandas DataFrame.","symbol":"discover_df","correct":"from tdda.constraints import discover_df"},{"note":"Used for verifying a Pandas DataFrame against a set of constraints.","symbol":"verify_df","correct":"from tdda.constraints import verify_df"},{"note":"WritableTestCase was superseded by ReferenceTest in a past update.","wrong":"from tdda.referencetest import WritableTestCase","symbol":"ReferenceTestCase","correct":"from tdda.referencetest import ReferenceTestCase"}],"quickstart":{"code":"import pandas as pd\nfrom tdda.constraints import discover_df, verify_df\nimport os\n\n# Create a sample DataFrame\ndata = {\n    'col1': [1, 2, 3, 4, 5, None],\n    'col2': ['A', 'B', 'A', 'C', 'B', 'D'],\n    'col3': [10.1, 11.2, 10.1, 13.4, 15.5, 12.3]\n}\ndf = pd.DataFrame(data)\n\n# 1. Discover constraints from the DataFrame\nconstraints = discover_df(df)\n\n# Constraints object has a to_json() method to save them\nconstraints_filename = 'my_dataframe_constraints.tdda'\nwith open(constraints_filename, 'w') as f:\n    f.write(constraints.to_json())\nprint(f\"Constraints discovered and saved to {constraints_filename}\")\n\n# 2. Verify a (potentially new or modified) DataFrame against the constraints\n# Let's create a slightly different DataFrame for verification\ndf_to_verify = pd.DataFrame({\n    'col1': [1, 2, 3, 6, 5, 7],\n    'col2': ['A', 'B', 'A', 'C', 'B', 'E'],\n    'col3': [10.1, 11.2, 10.1, 13.0, 15.5, 12.0]\n})\n\nverification_result = verify_df(df_to_verify, constraints_filename)\n\nprint(\"\\nVerification Results:\")\nprint(f\"Passed constraints: {verification_result.passes}\")\nprint(f\"Failed constraints: {verification_result.failures}\")\nif verification_result.failures > 0:\n    print(\"Details of failed constraints:\")\n    print(verification_result.to_frame())\n\n# Clean up the generated constraints file\nos.remove(constraints_filename)","lang":"python","description":"This quickstart demonstrates how to use `tdda.constraints` to automatically discover constraints from a Pandas DataFrame and then verify another DataFrame against these discovered constraints. It highlights the `discover_df` and `verify_df` functions, showing how to save and load constraints from a `.tdda` JSON file."},"warnings":[{"fix":"Upgrade your Python environment to 3.8 or newer before updating tdda.","message":"Python 2.7 support has been dropped. The library previously supported Python 2.7, but current versions (>=2.0) explicitly require Python >=3.8. Older codebases targeting Python 2.7 will break if upgrading `tdda` without migrating their Python environment.","severity":"breaking","affected_versions":"<2.0 (supported Python 2.7), >=2.0 (requires Python >=3.8)"},{"fix":"Migrate your reference tests from `WritableTestCase` to `ReferenceTest`.","message":"The `WritableTestCase` class for reference testing has been superseded by `ReferenceTest`. While `WritableTestCase` might still exist for backward compatibility in some older versions, new development should use `ReferenceTest` for improved features and maintainability.","severity":"deprecated","affected_versions":"All versions where `ReferenceTest` is available (from at least 2017-01-26 onwards)."},{"fix":"Install necessary optional dependencies (e.g., `pip install pandas feather-format pygresql`) based on the data sources you intend to use.","message":"Many features, particularly for constraint generation and verification against various data sources (databases, Feather files), rely on optional external dependencies (e.g., `pandas`, `feather-format`, database drivers). These packages are not installed by default with `pip install tdda` and must be installed separately if their corresponding functionality is required.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install `cython` (`pip install cython`) and the appropriate Microsoft Visual C++ compiler (e.g., through Visual Studio Build Tools) if you plan to use Feather files on Windows.","message":"When installing `feather-format` on Windows, you may encounter issues requiring `cython` and the Microsoft Visual C++ compiler for Python. This is a common prerequisite for many Python packages with C extensions on Windows.","severity":"gotcha","affected_versions":"All versions (on Windows when using feather files)"},{"fix":"Be specific in searches (e.g., \"Python tdda data analysis\") and context when discussing the library.","message":"The acronym \"TDDA\" is used by several unrelated projects (e.g., Java Thread Dump Analyzer, The Drug Detection Agency, Topological Data Analysis). This can lead to confusion when searching for documentation, examples, or discussing the Python `tdda` library. Ensure you are referencing the correct project.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}