Data Contract CLI

0.11.8 · active · verified Sun Apr 12

The datacontract CLI is an open-source command-line tool (current version 0.11.8) for working with Data Contracts. It natively supports the Open Data Contract Standard (ODCS) to lint data contracts, connect to data sources, execute schema and quality tests, detect breaking changes, and export to different formats. Written in Python, it can be used as a standalone CLI tool, in CI/CD pipelines, or directly as a Python library. The project is actively maintained with frequent releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `datacontract-cli` as a Python library to load a data contract from a YAML file and execute its defined schema and quality tests. It simulates a simple local data source. For actual data sources like S3 or BigQuery, ensure relevant environment variables are set for credentials, as shown in comments.

import os
from datacontract.data_contract import DataContract

# Simulate a datacontract.yaml file content
datacontract_yaml_content = '''
dataContractSpecification: 1.2.0
id: urn:datacontract:example:test-contract
info:
  title: Example Test Contract
  version: 1.0.0
  owner: Data Team
servers:
  local_file:
    type: local
    path: ./data/{model}.csv
    format: csv
    delimiter: ','
models:
  my_data:
    description: A simple dataset for testing.
    fields:
      id:
        type: string
        primaryKey: true
      name:
        type: string
      value:
        type: integer
    quality:
      - type: sql
        description: 'All values should be positive.'
        query: |
          SELECT *
          FROM my_data
          WHERE value <= 0
'''

# Create a dummy data file for the test
with open('data_my_data.csv', 'w') as f:
    f.write('id,name,value\n')
    f.write('1,Alice,10\n')
    f.write('2,Bob,20\n')

# Write the data contract to a temporary file
with open('datacontract.yaml', 'w') as f:
    f.write(datacontract_yaml_content)

# Environment variables for credentials are often required for real data sources.
# For local testing, they might not be strictly needed depending on the 'server' configuration.
# os.environ['DATACONTRACT_S3_ACCESS_KEY_ID'] = os.environ.get('DATACONTRACT_S3_ACCESS_KEY_ID', '')
# os.environ['DATACONTRACT_S3_SECRET_ACCESS_KEY'] = os.environ.get('DATACONTRACT_S3_SECRET_ACCESS_KEY', '')

try:
    data_contract = DataContract(data_contract_file="datacontract.yaml")
    run_results = data_contract.test()

    if run_results.has_passed():
        print("Data contract tests passed successfully.")
    else:
        print("Data contract tests failed.")
    print(run_results.to_json())

except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Clean up dummy files
    os.remove('datacontract.yaml')
    os.remove('data_my_data.csv')

view raw JSON →