Smart-open

7.5.1 · active · verified Sat Mar 28

Smart-open is a Python 3 library (current version 7.5.1) for efficient streaming of very large files from and to various storage systems, including S3, GCS, Azure Blob Storage, HDFS, WebHDFS, HTTP, HTTPS, SFTP, and local filesystems. It provides transparent, on-the-fly (de-)compression for formats like gzip, bz2, and zst, acting as a drop-in replacement for Python's built-in `open()` function. The library is actively maintained with frequent releases, offering a unified Pythonic API to simplify working with remote files and cloud storage services.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `smart_open.open` to read from and write to an S3 bucket. It automatically handles transparent compression/decompression based on file extension and integrates with underlying SDKs like boto3 for S3 access. Make sure your environment has appropriate cloud credentials configured.

import os
from smart_open import open

# Example for S3; similar patterns apply to GCS, Azure, etc.
# Ensure AWS credentials are configured (e.g., via environment variables, AWS CLI config, or IAM role).
# For production, consider explicit credential management via transport_params.
S3_BUCKET_NAME = os.environ.get('SMART_OPEN_S3_BUCKET', 'my-smart-open-test-bucket')
S3_KEY = 'example.txt'
S3_URL = f"s3://{S3_BUCKET_NAME}/{S3_KEY}"

# Write to S3
print(f"Writing to {S3_URL}...")
with open(S3_URL, 'w') as fout:
    fout.write('Hello, smart-open from S3!\n')
    fout.write('This is a second line.\n')
print("Write complete.")

# Read from S3
print(f"Reading from {S3_URL}...")
with open(S3_URL, 'r') as fin:
    for line in fin:
        print(f"Read line: {line.strip()}")
print("Read complete.")

view raw JSON →