Run:ai Model Streamer S3

0.15.8 · active · verified Tue Apr 14

The `runai-model-streamer-s3` library acts as a backend for the `runai-model-streamer`, enabling high-performance streaming of AI model weights (specifically Safetensors format) directly from S3-compatible object storage to GPU memory. It significantly reduces model loading times, addressing 'cold start' issues for large language models in inference scenarios. The current version is 0.15.8, with releases often aligned with the main `runai-model-streamer` project.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to utilize `SafetensorsStreamer` from the `runai-model-streamer` library to stream a SafeTensors model directly from an S3-compatible object store. This relies on `runai-model-streamer-s3` under the hood. It highlights the necessary environment variables for S3-compatible (e.g., GCS HMAC) authentication.

import os
from runai_model_streamer import SafetensorsStreamer

# NOTE: Replace with your actual S3-compatible storage details.
# For Google Cloud Storage (GCS) with S3-compatible HMAC authentication:
os.environ['AWS_ACCESS_KEY_ID'] = os.environ.get('RUNAI_S3_ACCESS_KEY_ID', 'YOUR_S3_ACCESS_KEY_ID')
os.environ['AWS_SECRET_ACCESS_KEY'] = os.environ.get('RUNAI_S3_SECRET_ACCESS_KEY', 'YOUR_S3_SECRET_ACCESS_KEY')
os.environ['AWS_ENDPOINT_URL'] = os.environ.get('RUNAI_S3_ENDPOINT_URL', 'https://storage.googleapis.com') # For GCS
os.environ['AWS_EC2_METADATA_DISABLED'] = 'true'

s3_model_path = "s3://your-bucket/path/to/model.safetensors"

try:
    with SafetensorsStreamer() as streamer:
        # In a real scenario, this would load the model tensors directly to GPU memory
        # For demonstration, we'll simulate streaming and print tensor names
        print(f"Attempting to stream from: {s3_model_path}")
        # A real model would typically be loaded like: streamer.stream_file(s3_model_path)
        # and then iterated: for name, tensor in streamer.get_tensors(): tensor.to('cuda:0')
        print("Simulating model streaming process. Ensure your environment variables are set correctly.")
        print("If a valid model were present, tensors would be streamed from S3.")
        # Example of how it would be used if a dummy file was streamable:
        # streamer.stream_file(s3_model_path)
        # for name, tensor in streamer.get_tensors():
        #    print(f"Streamed tensor: {name}")

except Exception as e:
    print(f"An error occurred during streaming setup (this is expected if S3 path is dummy or credentials are not set): {e}")
    print("Please ensure correct S3 path and environment variables for authentication.")

view raw JSON →