Frictionless Data
Frictionless is a data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data (DEVT Framework). It supports a great deal of data sources and formats, as well as provides popular platform integrations. The framework is powered by the lightweight yet comprehensive Frictionless Standards. The current version is 5.18.1. The project has an active development and release cadence, with version 2.0 of the underlying Data Package standard recently released in June 2024, and significant updates to the Python library.
Warnings
- breaking Frictionless Framework v5, released in December 2022, introduced several low-level breaking changes compared to v4. Users migrating from v4 or earlier should consult the official v5 announcement and migration guide for a smooth transition.
- gotcha Support for certain data formats or schemes (e.g., SQL databases, Pandas DataFrames, HTML, Parquet) requires installing additional plugins (e.g., `pip install frictionless[sql]`). Attempting to use these features without the corresponding plugin will result in an error message with installation instructions.
- gotcha Argument naming conventions differ across Frictionless interfaces: `snake_case` for Python arguments, `camelCase` for dictionary/JSON objects, and `dashes-case` for command-line interface arguments. Be mindful of these differences when moving between interfaces.
- gotcha The underlying Frictionless Data Package standard was updated to version 2.0 in June 2024. While `frictionless-py` aims for backward compatibility, new features or stricter adherence to the v2 spec might subtly change how data packages are processed or validated compared to older versions.
Install
-
pip install frictionless -
pip install frictionless[sql] -
pip install frictionless[pandas]
Imports
- describe
from frictionless import describe
- extract
from frictionless import extract
- validate
from frictionless import validate
- Package
from frictionless import Package
Quickstart
import os
from frictionless import describe, extract
# Create a dummy CSV file for demonstration
csv_content = """id,name,value
1,apple,100
2,banana,200
3,orange,150
"""
file_path = "data.csv"
with open(file_path, "w") as f:
f.write(csv_content)
# Describe the data to infer metadata (Table Schema)
print("--- Inferred Schema ---")
report = describe(file_path)
print(report.to_json(indent=2))
# Extract data as rows from the file
print("\n--- Extracted Data ---")
rows = extract(file_path)
for row in rows:
print(row)
# Clean up the dummy file
os.remove(file_path)