LlamaIndex S3 Reader

raw JSON →
0.6.1 verified Fri May 01 auth: no python

A reader for loading documents from AWS S3 into LlamaIndex. Supports both individual files and entire buckets with glob/filter patterns. Current version: 0.6.1, requires Python >=3.10, <4.0. Release cadence is irregular; part of LlamaIndex's modular reader ecosystem.

pip install llama-index-readers-s3
error ModuleNotFoundError: No module named 'llama_index.readers.s3'
cause The package `llama-index-readers-s3` is not installed.
fix
Run pip install llama-index-readers-s3.
error botocore.exceptions.NoCredentialsError: Unable to locate credentials
cause AWS credentials not provided. S3Reader requires valid AWS credentials.
fix
Set environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, or configure IAM role / ~/.aws/credentials.
gotcha To load multiple files (e.g., all PDFs in a prefix), you must use `S3Reader` with `key` omitted and optionally `file_extractor` or a filter. The `s3_reader` module also provides `S3FilesReader`? Check docs: as of 0.6.x, `S3Reader` with `key` as a pattern or directory prefix may not work; prefer `S3FilesReader` or manual iteration.
fix Use `S3FilesReader` from `llama_index.readers.s3` if available, or pass a prefix via `key` (e.g., 'data/') and set `recursive=True`.
gotcha Credentials are required even for public buckets? `S3Reader` does not automatically assume anonymous access. If your bucket is public, you still need to provide dummy credentials or set `anon=True`? The constructor does not have `anon` parameter. Workaround: configure boto3 session manually via environment variables or pass dummy keys.
fix Set `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` env vars even for public buckets, or use boto3's `Config` with `signature_version=UNSIGNED` before instantiating reader.
deprecated The `llama-index-readers-s3` package may be merged into `llama-index` core or replaced by a new reader interface. Check for deprecation warnings in logs.
fix Import from `llama_index.readers.s3.S3Reader`; if removed, use `llama_index`'s `SimpleDirectoryReader` with S3 URI support.

Minimal example: load a single file from S3 using explicit AWS credentials (or leave empty for IAM roles).

import os
from llama_index.readers.s3 import S3Reader

# Configure AWS credentials via env vars or IAM role
# os.environ['AWS_ACCESS_KEY_ID'] = 'your-key'
# os.environ['AWS_SECRET_ACCESS_KEY'] = 'your-secret'

reader = S3Reader(
    bucket="my-bucket",
    key="path/to/file.pdf",
    aws_access_id=os.environ.get('AWS_ACCESS_KEY_ID', ''),
    aws_access_secret=os.environ.get('AWS_SECRET_ACCESS_KEY', ''),
)
documents = reader.load_data()
print(f"Loaded {len(documents)} document(s)")