S3Fs
S3Fs is a Pythonic filesystem interface to Amazon S3, built on top of aiobotocore and fsspec. The top-level class S3FileSystem exposes familiar file-system operations (ls, cp, mv, du, glob, put, get) and a file open() API that emulates Python's standard file protocol, making it a drop-in for libraries like pandas, dask, and gzip that accept file-like objects. It also supports S3-compatible stores (MinIO, Ceph, R2) via the endpoint_url parameter. Versions follow calendar versioning (YYYY.MM.PATCH); the current release is 2026.2.0, released February 2026, with roughly monthly cadence.
Warnings
- breaking aiobotocore pins an extremely narrow botocore version range (often a single patch). Installing s3fs alongside boto3 or awscli frequently produces irresolvable dependency conflicts because boto3 requires a different botocore range.
- breaking Using multiprocessing with the default 'fork' start method causes deadlocks and hard-to-reproduce bugs because s3fs keeps open async sockets and a background thread.
- breaking s3fs version 2023.12.0 was yanked from PyPI due to an authentication regression. pip may still resolve to it on some platforms if not using --pre filtering.
- gotcha The directory listing cache (dircache) is not invalidated automatically. If an object is written or resized externally (e.g. by boto3 or another process) after fs.ls() or fs.info() has cached its metadata, subsequent reads via the same S3FileSystem instance will use stale size information and may return corrupted or truncated data.
- gotcha File access is always binary. Text mode ('r', 'w') is technically accepted but returns bytes or requires an explicit encoding wrapper. readline() and line iteration work but the underlying stream is always bytes.
- gotcha S3FileSystem instances are cached as singletons by default (skip_instance_cache=False). Two calls with the same credentials return the same object, which can cause credential or config bleed between parts of an application that expect independent connections.
- gotcha Writes to S3 are not flushed until the file is closed (or the multipart threshold of ~150 MiB is hit). Calling f.write() without closing inside a context manager means data is buffered locally and nothing is committed to S3 on partial writes.
Install
-
pip install s3fs -
pip install 's3fs[boto3]' -
conda install -c conda-forge s3fs
Imports
- S3FileSystem
import s3fs s3 = s3fs.S3FileSystem()
- S3File
with s3fs.S3FileSystem().open('bucket/key', 'rb') as f: ... - open via fsspec URL
import fsspec with fsspec.open('s3://bucket/key', 'rb') as f: ...
Quickstart
import os
import s3fs
# Credentials via env vars (boto chain also checks ~/.aws/credentials, IAM roles, etc.)
fs = s3fs.S3FileSystem(
key=os.environ.get('AWS_ACCESS_KEY_ID', ''),
secret=os.environ.get('AWS_SECRET_ACCESS_KEY', ''),
# token=os.environ.get('AWS_SESSION_TOKEN', ''), # uncomment for STS/assumed-role
# endpoint_url='https://s3.example.com', # uncomment for MinIO / S3-compatible
)
# List bucket contents
bucket = os.environ.get('S3_BUCKET', 'my-bucket')
print(fs.ls(bucket))
# Read a file
with fs.open(f'{bucket}/hello.txt', 'rb') as f:
print(f.read())
# Write a file (must flush >5 MiB for multipart; context manager handles this)
with fs.open(f'{bucket}/output.txt', 'wb') as f:
f.write(b'hello s3fs')
# Works transparently with pandas via storage_options
import pandas as pd
df = pd.read_csv(
f's3://{bucket}/data.csv',
storage_options={
'key': os.environ.get('AWS_ACCESS_KEY_ID', ''),
'secret': os.environ.get('AWS_SECRET_ACCESS_KEY', ''),
},
)
print(df.head())