Run:ai Model Streamer S3
The `runai-model-streamer-s3` library acts as a backend for the `runai-model-streamer`, enabling high-performance streaming of AI model weights (specifically Safetensors format) directly from S3-compatible object storage to GPU memory. It significantly reduces model loading times, addressing 'cold start' issues for large language models in inference scenarios. The current version is 0.15.8, with releases often aligned with the main `runai-model-streamer` project.
Warnings
- gotcha When streaming from S3-compatible storage like Google Cloud Storage (GCS) using S3 HMAC authentication, specific environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_ENDPOINT_URL`, `AWS_EC2_METADATA_DISABLED=true`) must be correctly set. Incorrect configuration will lead to authentication or file access errors.
- breaking The `runai-model-streamer`'s C++ backend, which `runai-model-streamer-s3` depends on, requires `libcurl4` and `libssl1.1_1` system libraries to be installed. Missing these libraries will prevent the streamer from functioning.
- gotcha The SDK's S3 credential resolution mechanism might differ from the AWS CLI. Issues can arise if environment variables (e.g., `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) are not correctly propagated or if credential files are not found in expected locations, especially within containerized environments.
- gotcha Mixing S3 paths and local file system paths within a single `streamer.stream_files()` call is not supported and will result in errors.
- gotcha When deploying many servers concurrently that stream models from S3, high concurrent demand on S3 throughput can lead to streaming errors and processes hanging, particularly when S3 throughput naturally decreases per replica.
Install
-
pip install runai-model-streamer-s3 -
pip install vllm[runai]
Imports
- SafetensorsStreamer
from runai_model_streamer import SafetensorsStreamer
Quickstart
import os
from runai_model_streamer import SafetensorsStreamer
# NOTE: Replace with your actual S3-compatible storage details.
# For Google Cloud Storage (GCS) with S3-compatible HMAC authentication:
os.environ['AWS_ACCESS_KEY_ID'] = os.environ.get('RUNAI_S3_ACCESS_KEY_ID', 'YOUR_S3_ACCESS_KEY_ID')
os.environ['AWS_SECRET_ACCESS_KEY'] = os.environ.get('RUNAI_S3_SECRET_ACCESS_KEY', 'YOUR_S3_SECRET_ACCESS_KEY')
os.environ['AWS_ENDPOINT_URL'] = os.environ.get('RUNAI_S3_ENDPOINT_URL', 'https://storage.googleapis.com') # For GCS
os.environ['AWS_EC2_METADATA_DISABLED'] = 'true'
s3_model_path = "s3://your-bucket/path/to/model.safetensors"
try:
with SafetensorsStreamer() as streamer:
# In a real scenario, this would load the model tensors directly to GPU memory
# For demonstration, we'll simulate streaming and print tensor names
print(f"Attempting to stream from: {s3_model_path}")
# A real model would typically be loaded like: streamer.stream_file(s3_model_path)
# and then iterated: for name, tensor in streamer.get_tensors(): tensor.to('cuda:0')
print("Simulating model streaming process. Ensure your environment variables are set correctly.")
print("If a valid model were present, tensors would be streamed from S3.")
# Example of how it would be used if a dummy file was streamable:
# streamer.stream_file(s3_model_path)
# for name, tensor in streamer.get_tensors():
# print(f"Streamed tensor: {name}")
except Exception as e:
print(f"An error occurred during streaming setup (this is expected if S3 path is dummy or credentials are not set): {e}")
print("Please ensure correct S3 path and environment variables for authentication.")