{"id":5465,"library":"runai-model-streamer","title":"Run:ai Model Streamer","description":"The Run:ai Model Streamer is an open-source Python SDK designed to accelerate the loading of large AI models onto accelerators, such as GPUs or TPUs. It achieves this by streaming tensors directly from various storage locations (local, S3, GCS, Azure Blob Storage) to GPU memory, bypassing local disk buffering, and optimizing for the SafeTensors file format. The current version is 0.15.8, with releases occurring somewhat regularly, indicating active development.","status":"active","version":"0.15.8","language":"en","source_language":"en","source_url":"https://github.com/run-ai/runai-model-streamer","tags":["AI/ML","model serving","GPU","streaming","safetensors","vLLM","object storage"],"install":[{"cmd":"pip install runai-model-streamer","lang":"bash","label":"Core library"},{"cmd":"pip install runai-model-streamer-gcs","lang":"bash","label":"Google Cloud Storage backend"},{"cmd":"pip install runai-model-streamer-s3","lang":"bash","label":"AWS S3 compatible backend"},{"cmd":"pip install runai-model-streamer-azure","lang":"bash","label":"Azure Blob Storage backend"},{"cmd":"pip install vllm[runai]","lang":"bash","label":"For vLLM integration"}],"dependencies":[{"reason":"Required system library for C++ backend functionality.","package":"libcurl4","optional":false},{"reason":"Required system library for C++ backend functionality.","package":"libssl1.1_1","optional":false},{"reason":"Required for streaming models from Google Cloud Storage.","package":"runai-model-streamer-gcs","optional":true},{"reason":"Required for streaming models from AWS S3 or S3-compatible object stores.","package":"runai-model-streamer-s3","optional":true},{"reason":"Required for streaming models from Azure Blob Storage.","package":"runai-model-streamer-azure","optional":true},{"reason":"Used for accelerated LLM inference; `runai-model-streamer` integrates with vLLM.","package":"vllm","optional":true}],"imports":[{"symbol":"SafetensorsStreamer","correct":"from runai_model_streamer import SafetensorsStreamer"}],"quickstart":{"code":"import os\nfrom runai_model_streamer import SafetensorsStreamer\n\n# This is a placeholder for a safetensors file. In a real scenario,\n# 'model.safetensors' would be a path to your model file.\n# For a runnable example, you might create a dummy file or adapt to a real path.\n# If streaming from cloud storage, ensure appropriate backend package is installed\n# and environment variables for authentication are set (e.g., GOOGLE_APPLICATION_CREDENTIALS for GCS).\n# For local testing, ensure 'model.safetensors' exists or is mocked.\n\n# Example of creating a dummy safetensors file for local quickstart demonstration\ntry:\n    from safetensors.torch import save_file\n    import torch\n    dummy_tensor = {'tensor_key': torch.randn(10, 10)}\n    save_file(dummy_tensor, 'model.safetensors')\n    file_path = \"model.safetensors\"\n\n    print(f\"Attempting to stream from: {file_path}\")\n\n    with SafetensorsStreamer() as streamer:\n        streamer.stream_file(file_path)\n        print(\"Successfully started streaming.\")\n        # In a real scenario, you would then iterate and process tensors:\n        # for name, tensor in streamer.get_tensors():\n        #    gpu_tensor = tensor.to('cuda:0') # Or another accelerator\n        #    print(f\"Streamed tensor: {name}, shape: {gpu_tensor.shape}\")\n\n    print(\"Streamer context closed.\")\nexcept ImportError:\n    print(\"To run this quickstart with a dummy file, install 'safetensors' and 'torch':\")\n    print(\"pip install safetensors torch\")\n    print(\"Alternatively, replace 'model.safetensors' with an actual path to your model file.\")\nexcept Exception as e:\n    print(f\"An error occurred during quickstart: {e}\")\n    print(\"Ensure the file_path is correct and necessary system libraries (libcurl4, libssl1.1_1) are installed.\")\n    print(\"If streaming from cloud storage, verify environment variables for authentication are set.\")\n","lang":"python","description":"This quickstart demonstrates how to use `SafetensorsStreamer` to initiate streaming of a model. It creates a dummy `safetensors` file for a runnable example. For actual use, `file_path` should point to your model. When working with cloud storage (S3, GCS, Azure), ensure the respective `runai-model-streamer-*` package is installed and authentication environment variables (e.g., `GOOGLE_APPLICATION_CREDENTIALS` for GCS, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` for S3, `AZURE_CLIENT_ID` for Azure) are correctly configured. The `stream_file` method starts the streaming process, and `get_tensors()` can then be used to retrieve tensors."},"warnings":[{"fix":"Ensure `libcurl4` and `libssl1.1_1` are installed on your system or within your container image. For Debian-based systems, this typically involves `sudo apt-get install libcurl4-openssl-dev libssl-dev` (or similar packages matching the required versions).","message":"The C++ backend of the streamer requires specific system libraries: `libcurl4` and `libssl1.1_1`. Without these, the Python SDK will not function correctly, leading to runtime errors during model streaming. This is a common installation footgun, especially in minimal container environments.","severity":"breaking","affected_versions":"All versions"},{"fix":"Install the relevant backend package (`pip install runai-model-streamer-[gcs|s3|azure]`). Configure environment variables such as `GOOGLE_APPLICATION_CREDENTIALS` (for GCS), `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` (for S3), or `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET` (for Azure). Refer to the documentation for specific authentication methods.","message":"When streaming from cloud object storage (S3, GCS, Azure Blob Storage), specific `runai-model-streamer-*` backend packages (e.g., `runai-model-streamer-gcs`) must be installed in addition to the core `runai-model-streamer` package. Furthermore, proper authentication credentials must be configured via environment variables or service account files for the SDK to access the storage buckets.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to `runai-model-streamer` version 0.15.x or later. Ensure `vllm[runai]` is updated. When using `vllm serve`, include `--model-loader-extra-config '{\"distributed\":true}'` for optimal distributed loading from object storage.","message":"Older versions of `runai-model-streamer` used with vLLM, particularly with `tensor-parallel-size > 1`, exhibited pickling errors or issues with distributed streaming across multiple GPUs. This primarily affected distributed loading.","severity":"deprecated","affected_versions":"< 0.15.x (specifically before fixes for #11819 and #130 on GitHub)"},{"fix":"Store your AI model weights in the `SafeTensors` format for maximum performance. Tools and libraries like Hugging Face's `safetensors` facilitate saving models in this format.","message":"The `Run:ai Model Streamer` is primarily optimized for the `SafeTensors` file format, which enables efficient zero-copy loading directly from storage. While it may handle other formats, performance benefits are most pronounced with `SafeTensors`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Carefully tune `RUNAI_STREAMER_CONCURRENCY` (number of OS threads for reading) and `RUNAI_STREAMER_MEMORY_LIMIT` (CPU buffer size) based on your specific model size, available CPU memory, and network bandwidth to object storage. Refer to the official documentation for guidance on these tunable parameters and their impact on performance.","message":"Setting environment variables like `RUNAI_STREAMER_CONCURRENCY` and `RUNAI_STREAMER_MEMORY_LIMIT` can significantly impact performance and resource consumption. Incorrect tuning can lead to suboptimal loading times or out-of-memory issues.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}