Tentaclio
Tentaclio is a Python library designed to unify data connectors for distributed data tasks, offering a consistent API for interacting with various data sources like local files, FTP, SFTP, S3, GCS, and databases. It provides a simple URL-based interface for streams and database connections. The library is actively maintained with frequent minor releases to address security updates and improve functionality.
Warnings
- gotcha Many common protocols (e.g., S3, Google Storage, Databricks, various databases) are not included in the base installation. Attempting to use URLs for these schemes without installing the corresponding optional dependencies (e.g., `pip install tentaclio[s3]`) will result in runtime errors.
- gotcha When using environment variables (prefixed with `TENTACLIO__CONN__`) or a secrets file for authentication, Tentaclio treats the hostname in the URL provided to `tentaclio.open` or `tentaclio.db` as a wildcard. Credentials from the environment variable with a matching scheme and path will be injected. This means `sftp://sftp.example.com/file.txt` will use credentials defined for `sftp://user:pass@sftp.example.com/`, even if the provided URL doesn't contain user/pass.
- deprecated Earlier versions (prior to 1.3.2) did not support specifying SFTP private key path and password directly via query parameters in the URL. This functionality was added in version 1.3.2.
- gotcha The `tentaclio.db` function for database connections has specific handling for some URL formats. For instance, for PostgreSQL, a URL like `postgresql://host/database::table` implies writing from a CSV format into a database table, where the table name is specified after `::`.
Install
-
pip install tentaclio -
pip install tentaclio[s3,gs,postgres]
Imports
- open
import tentaclio with tentaclio.open('file:///path/to/file.txt') as reader: - db
import tentaclio with tentaclio.db('postgresql://hostname/example') as client: - copy
import tentaclio tentaclio.copy('file:///local/source.txt', 's3://my-bucket/remote/dest.txt')
Quickstart
import tentaclio
import os
# Example: Write and then read a local file
local_file_url = 'file:///tmp/my_data.txt'
contents_to_write = 'Hello, Tentaclio! ๐๐'
# Write to a local file
with tentaclio.open(local_file_url, mode='w') as writer:
writer.write(contents_to_write)
print(f"Written to {local_file_url}")
# Read from the local file
with tentaclio.open(local_file_url, mode='r') as reader:
read_contents = reader.read()
print(f"Read from {local_file_url}: {read_contents}")
# Clean up the test file
os.remove('/tmp/my_data.txt')
print("Cleaned up /tmp/my_data.txt")
# Example: Reading from an S3 bucket (requires tentaclio[s3] and AWS credentials)
# Set environment variables like:
# os.environ['TENTACLIO__CONN__MY_S3_BUCKET'] = 's3://access_key:secret_key@s3.region.amazonaws.com/my-bucket/'
# s3_url = 's3://my-bucket/hello.txt'
# if os.environ.get('TENTACLIO__CONN__MY_S3_BUCKET'):
# try:
# with tentaclio.open(s3_url) as reader:
# s3_contents = reader.read()
# print(f"Read from S3: {s3_contents}")
# except Exception as e:
# print(f"Could not read from S3: {e}")