{"id":6448,"library":"s3torchconnectorclient","title":"S3TorchConnectorClient","description":"The `s3torchconnectorclient` library is an internal S3 client implementation that underpins the `s3torchconnector` library. It provides high-throughput data access and checkpointing capabilities for PyTorch training jobs interacting with Amazon S3. It is currently at version 1.5.0 and is actively developed with regular releases, often in sync with the broader `s3torchconnector` project.","status":"active","version":"1.5.0","language":"en","source_language":"en","source_url":"https://github.com/awslabs/s3-connector-for-pytorch","tags":["aws","s3","pytorch","machine-learning","storage","data-loading","checkpointing"],"install":[{"cmd":"pip install s3torchconnectorclient","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"This client is an internal component of `s3torchconnector` and is primarily used through its higher-level APIs.","package":"s3torchconnector","optional":false},{"reason":"Required for AWS S3 interactions; specific versions may introduce behavioral changes (see warnings).","package":"boto3","optional":false},{"reason":"Underlying AWS SDK for Python, often installed with boto3.","package":"botocore","optional":false}],"imports":[{"note":"Used for fine-grained configuration of the underlying S3 client behavior. Typically passed to constructors in the `s3torchconnector` library.","symbol":"S3ClientConfig","correct":"from s3torchconnectorclient import S3ClientConfig"}],"quickstart":{"code":"import os\nfrom s3torchconnectorclient import S3ClientConfig\n\n# Configure the S3 client directly. This configuration is typically\n# passed to higher-level constructors within the `s3torchconnector` library.\nconfig = S3ClientConfig(\n    part_size=10 * 1024 * 1024, # Example: 10 MiB part size for transfers\n    throughput_target_gbps=5.0, # Example: Target 5 Gbps throughput\n    profile=os.environ.get('AWS_PROFILE', None) # Use an AWS profile if specified\n)\n\nprint(f\"S3ClientConfig created with part size: {config.part_size / (1024*1024):.1f} MiB\")\nprint(f\"S3ClientConfig created with throughput target: {config.throughput_target_gbps} Gbps\")\nprint(f\"S3ClientConfig using AWS profile: {config.profile}\")\n\n# In a real application, 'config' would typically be used like this (requires 's3torchconnector'):\n# from s3torchconnector import S3ReaderConstructor\n# reader_constructor = S3ReaderConstructor.sequential(s3_client_config=config)\n# dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, reader_constructor=reader_constructor)","lang":"python","description":"This quickstart demonstrates how to directly configure the `S3ClientConfig` which is part of `s3torchconnectorclient`. While direct interaction with this low-level client is possible, it's more commonly used by passing `S3ClientConfig` instances to higher-level APIs provided by the `s3torchconnector` library for datasets and readers. This example sets a custom part size and throughput target. Ensure AWS credentials are configured (e.g., via environment variables or AWS CLI) for actual S3 operations."},"warnings":[{"fix":"Update custom S3 client logic to use `HeadObjectResult` and adapt if the `key` field was previously accessed. For most users leveraging `s3torchconnector`'s high-level APIs, this change is internal and should not require direct modification.","message":"In version 1.5.0, the internal S3Client now returns `HeadObjectResult` instead of `ObjectInfo`. `HeadObjectResult` does not include the `key` field, which might break custom reader implementations that directly relied on the `key` field from `ObjectInfo`.","severity":"breaking","affected_versions":">=1.5.0"},{"fix":"Review the documentation for `DCPOptimizedS3Reader Errors` and test existing workloads thoroughly. If previous behavior is required, explicitly configure `S3StorageReader` with `S3ReaderConstructor.sequential()` or `S3ReaderConstructor.range_based()`.","message":"Starting with version 1.5.0, `DCPOptimizedS3Reader` became the new default reader for `S3StorageReader` in `s3torchconnector`. While this offers performance improvements, it might lead to behavioral changes, especially with specific access patterns or error handling.","severity":"breaking","affected_versions":">=1.5.0"},{"fix":"Upgrade to Python 3.9 or newer. The library currently supports Python 3.8-3.14.","message":"Python 3.8 support is being deprecated and will be removed in a future release. PyTorch itself has stopped supporting Python 3.8 after v2.4.1.","severity":"deprecated","affected_versions":"future release"},{"fix":"Users on macOS x86_64 should plan to migrate to an ARM-based Mac or use a Linux environment.","message":"macOS x86_64 wheel support will be deprecated in a future release.","severity":"deprecated","affected_versions":"future release"},{"fix":"If experiencing issues with third-party S3 services, you can disable the new integrity protections by setting `request_checksum_calculation='when_required'` and `response_checksum_validation='when_required'` in your AWS configuration (e.g., via `~/.aws/config` or environment variables). It is generally not recommended to disable these for Amazon S3 itself.","message":"Beginning with `boto3` v1.36.0, AWS SDK for Python introduced new default integrity protections for S3 clients (checksums on Put and validation on Get). This can cause issues when interacting with third-party S3-compatible services that may not fully support these new defaults.","severity":"gotcha","affected_versions":"boto3 >=1.36.0"},{"fix":"Do not share `S3Reader` instances across multiple threads. For multiprocessing with `DataLoader`, each worker process automatically creates its own `S3Reader` instance, which is the recommended pattern.","message":"`S3Reader` instances (which are part of the `s3torchconnector`'s underlying read mechanism, utilizing `s3torchconnectorclient`) are not thread-safe.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z"}