S3TorchConnectorClient
The `s3torchconnectorclient` library is an internal S3 client implementation that underpins the `s3torchconnector` library. It provides high-throughput data access and checkpointing capabilities for PyTorch training jobs interacting with Amazon S3. It is currently at version 1.5.0 and is actively developed with regular releases, often in sync with the broader `s3torchconnector` project.
Warnings
- breaking In version 1.5.0, the internal S3Client now returns `HeadObjectResult` instead of `ObjectInfo`. `HeadObjectResult` does not include the `key` field, which might break custom reader implementations that directly relied on the `key` field from `ObjectInfo`.
- breaking Starting with version 1.5.0, `DCPOptimizedS3Reader` became the new default reader for `S3StorageReader` in `s3torchconnector`. While this offers performance improvements, it might lead to behavioral changes, especially with specific access patterns or error handling.
- deprecated Python 3.8 support is being deprecated and will be removed in a future release. PyTorch itself has stopped supporting Python 3.8 after v2.4.1.
- deprecated macOS x86_64 wheel support will be deprecated in a future release.
- gotcha Beginning with `boto3` v1.36.0, AWS SDK for Python introduced new default integrity protections for S3 clients (checksums on Put and validation on Get). This can cause issues when interacting with third-party S3-compatible services that may not fully support these new defaults.
- gotcha `S3Reader` instances (which are part of the `s3torchconnector`'s underlying read mechanism, utilizing `s3torchconnectorclient`) are not thread-safe.
Install
-
pip install s3torchconnectorclient
Imports
- S3ClientConfig
from s3torchconnectorclient import S3ClientConfig
Quickstart
import os
from s3torchconnectorclient import S3ClientConfig
# Configure the S3 client directly. This configuration is typically
# passed to higher-level constructors within the `s3torchconnector` library.
config = S3ClientConfig(
part_size=10 * 1024 * 1024, # Example: 10 MiB part size for transfers
throughput_target_gbps=5.0, # Example: Target 5 Gbps throughput
profile=os.environ.get('AWS_PROFILE', None) # Use an AWS profile if specified
)
print(f"S3ClientConfig created with part size: {config.part_size / (1024*1024):.1f} MiB")
print(f"S3ClientConfig created with throughput target: {config.throughput_target_gbps} Gbps")
print(f"S3ClientConfig using AWS profile: {config.profile}")
# In a real application, 'config' would typically be used like this (requires 's3torchconnector'):
# from s3torchconnector import S3ReaderConstructor
# reader_constructor = S3ReaderConstructor.sequential(s3_client_config=config)
# dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, reader_constructor=reader_constructor)