Run:ai Model Streamer
The Run:ai Model Streamer is an open-source Python SDK designed to accelerate the loading of large AI models onto accelerators, such as GPUs or TPUs. It achieves this by streaming tensors directly from various storage locations (local, S3, GCS, Azure Blob Storage) to GPU memory, bypassing local disk buffering, and optimizing for the SafeTensors file format. The current version is 0.15.8, with releases occurring somewhat regularly, indicating active development.
Warnings
- breaking The C++ backend of the streamer requires specific system libraries: `libcurl4` and `libssl1.1_1`. Without these, the Python SDK will not function correctly, leading to runtime errors during model streaming. This is a common installation footgun, especially in minimal container environments.
- gotcha When streaming from cloud object storage (S3, GCS, Azure Blob Storage), specific `runai-model-streamer-*` backend packages (e.g., `runai-model-streamer-gcs`) must be installed in addition to the core `runai-model-streamer` package. Furthermore, proper authentication credentials must be configured via environment variables or service account files for the SDK to access the storage buckets.
- deprecated Older versions of `runai-model-streamer` used with vLLM, particularly with `tensor-parallel-size > 1`, exhibited pickling errors or issues with distributed streaming across multiple GPUs. This primarily affected distributed loading.
- gotcha The `Run:ai Model Streamer` is primarily optimized for the `SafeTensors` file format, which enables efficient zero-copy loading directly from storage. While it may handle other formats, performance benefits are most pronounced with `SafeTensors`.
- gotcha Setting environment variables like `RUNAI_STREAMER_CONCURRENCY` and `RUNAI_STREAMER_MEMORY_LIMIT` can significantly impact performance and resource consumption. Incorrect tuning can lead to suboptimal loading times or out-of-memory issues.
Install
-
pip install runai-model-streamer -
pip install runai-model-streamer-gcs -
pip install runai-model-streamer-s3 -
pip install runai-model-streamer-azure -
pip install vllm[runai]
Imports
- SafetensorsStreamer
from runai_model_streamer import SafetensorsStreamer
Quickstart
import os
from runai_model_streamer import SafetensorsStreamer
# This is a placeholder for a safetensors file. In a real scenario,
# 'model.safetensors' would be a path to your model file.
# For a runnable example, you might create a dummy file or adapt to a real path.
# If streaming from cloud storage, ensure appropriate backend package is installed
# and environment variables for authentication are set (e.g., GOOGLE_APPLICATION_CREDENTIALS for GCS).
# For local testing, ensure 'model.safetensors' exists or is mocked.
# Example of creating a dummy safetensors file for local quickstart demonstration
try:
from safetensors.torch import save_file
import torch
dummy_tensor = {'tensor_key': torch.randn(10, 10)}
save_file(dummy_tensor, 'model.safetensors')
file_path = "model.safetensors"
print(f"Attempting to stream from: {file_path}")
with SafetensorsStreamer() as streamer:
streamer.stream_file(file_path)
print("Successfully started streaming.")
# In a real scenario, you would then iterate and process tensors:
# for name, tensor in streamer.get_tensors():
# gpu_tensor = tensor.to('cuda:0') # Or another accelerator
# print(f"Streamed tensor: {name}, shape: {gpu_tensor.shape}")
print("Streamer context closed.")
except ImportError:
print("To run this quickstart with a dummy file, install 'safetensors' and 'torch':")
print("pip install safetensors torch")
print("Alternatively, replace 'model.safetensors' with an actual path to your model file.")
except Exception as e:
print(f"An error occurred during quickstart: {e}")
print("Ensure the file_path is correct and necessary system libraries (libcurl4, libssl1.1_1) are installed.")
print("If streaming from cloud storage, verify environment variables for authentication are set.")