{"id":14900,"library":"runai-model-streamer-s3","title":"Run:ai Model Streamer S3","description":"The `runai-model-streamer-s3` library acts as a backend for the `runai-model-streamer`, enabling high-performance streaming of AI model weights (specifically Safetensors format) directly from S3-compatible object storage to GPU memory. It significantly reduces model loading times, addressing 'cold start' issues for large language models in inference scenarios. The current version is 0.15.8, with releases often aligned with the main `runai-model-streamer` project.","status":"active","version":"0.15.8","language":"en","source_language":"en","source_url":"https://github.com/run-ai/runai-model-streamer","tags":["AI/ML","model serving","S3","streaming","GPU","vLLM","LLM"],"install":[{"cmd":"pip install runai-model-streamer-s3","lang":"bash","label":"Direct installation"},{"cmd":"pip install vllm[runai]","lang":"bash","label":"As a vLLM dependency (includes runai-model-streamer-s3)"}],"dependencies":[{"reason":"Provides the primary Python API (`SafetensorsStreamer`) that utilizes this S3 backend.","package":"runai-model-streamer","optional":false},{"reason":"Required system library for the underlying C++ backend.","package":"libcurl4","optional":false},{"reason":"Required system library for the underlying C++ backend.","package":"libssl1.1_1","optional":false}],"imports":[{"note":"While `runai-model-streamer-s3` provides the S3 backend, the primary user-facing class for streaming models is `SafetensorsStreamer` from the `runai-model-streamer` library. `runai-model-streamer-s3` is implicitly used when an S3 path is provided and relevant environment variables are set.","symbol":"SafetensorsStreamer","correct":"from runai_model_streamer import SafetensorsStreamer"}],"quickstart":{"code":"import os\nfrom runai_model_streamer import SafetensorsStreamer\n\n# NOTE: Replace with your actual S3-compatible storage details.\n# For Google Cloud Storage (GCS) with S3-compatible HMAC authentication:\nos.environ['AWS_ACCESS_KEY_ID'] = os.environ.get('RUNAI_S3_ACCESS_KEY_ID', 'YOUR_S3_ACCESS_KEY_ID')\nos.environ['AWS_SECRET_ACCESS_KEY'] = os.environ.get('RUNAI_S3_SECRET_ACCESS_KEY', 'YOUR_S3_SECRET_ACCESS_KEY')\nos.environ['AWS_ENDPOINT_URL'] = os.environ.get('RUNAI_S3_ENDPOINT_URL', 'https://storage.googleapis.com') # For GCS\nos.environ['AWS_EC2_METADATA_DISABLED'] = 'true'\n\ns3_model_path = \"s3://your-bucket/path/to/model.safetensors\"\n\ntry:\n    with SafetensorsStreamer() as streamer:\n        # In a real scenario, this would load the model tensors directly to GPU memory\n        # For demonstration, we'll simulate streaming and print tensor names\n        print(f\"Attempting to stream from: {s3_model_path}\")\n        # A real model would typically be loaded like: streamer.stream_file(s3_model_path)\n        # and then iterated: for name, tensor in streamer.get_tensors(): tensor.to('cuda:0')\n        print(\"Simulating model streaming process. Ensure your environment variables are set correctly.\")\n        print(\"If a valid model were present, tensors would be streamed from S3.\")\n        # Example of how it would be used if a dummy file was streamable:\n        # streamer.stream_file(s3_model_path)\n        # for name, tensor in streamer.get_tensors():\n        #    print(f\"Streamed tensor: {name}\")\n\nexcept Exception as e:\n    print(f\"An error occurred during streaming setup (this is expected if S3 path is dummy or credentials are not set): {e}\")\n    print(\"Please ensure correct S3 path and environment variables for authentication.\")\n","lang":"python","description":"This quickstart demonstrates how to utilize `SafetensorsStreamer` from the `runai-model-streamer` library to stream a SafeTensors model directly from an S3-compatible object store. This relies on `runai-model-streamer-s3` under the hood. It highlights the necessary environment variables for S3-compatible (e.g., GCS HMAC) authentication."},"warnings":[{"fix":"Set the required AWS-compatible environment variables corresponding to your S3-compatible storage credentials and endpoint URL before initializing the streamer. For GCS, set `AWS_ENDPOINT_URL` to `https://storage.googleapis.com`.","message":"When streaming from S3-compatible storage like Google Cloud Storage (GCS) using S3 HMAC authentication, specific environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_ENDPOINT_URL`, `AWS_EC2_METADATA_DISABLED=true`) must be correctly set. Incorrect configuration will lead to authentication or file access errors.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure `libcurl4` and `libssl1.1_1` are installed on your system (e.g., `sudo apt-get install libcurl4-openssl-dev libssl-dev` on Debian/Ubuntu-based systems, or equivalent for other distributions).","message":"The `runai-model-streamer`'s C++ backend, which `runai-model-streamer-s3` depends on, requires `libcurl4` and `libssl1.1_1` system libraries to be installed. Missing these libraries will prevent the streamer from functioning.","severity":"breaking","affected_versions":"All"},{"fix":"Explicitly set S3 authentication environment variables in your runtime environment (e.g., Dockerfile, Kubernetes deployment) or verify that credential files are correctly mounted and accessible to the application.","message":"The SDK's S3 credential resolution mechanism might differ from the AWS CLI. Issues can arise if environment variables (e.g., `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) are not correctly propagated or if credential files are not found in expected locations, especially within containerized environments.","severity":"gotcha","affected_versions":"All"},{"fix":"Process S3 paths and local file system paths in separate streaming operations if both are required.","message":"Mixing S3 paths and local file system paths within a single `streamer.stream_files()` call is not supported and will result in errors.","severity":"gotcha","affected_versions":"All"},{"fix":"Monitor S3 throughput and adjust `concurrency` and `memory_limit` parameters via `--model-loader-extra-config` in vLLM, or `RUNAI_STREAMER_CONCURRENCY` and `RUNAI_STREAMER_MEMORY_LIMIT` environment variables for `runai-model-streamer`, to optimize performance and prevent resource exhaustion. Consider local caching strategies for frequently accessed models if applicable.","message":"When deploying many servers concurrently that stream models from S3, high concurrent demand on S3 throughput can lead to streaming errors and processes hanging, particularly when S3 throughput naturally decreases per replica.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z","problems":[],"ecosystem":"pypi"}