{"id":2590,"library":"model-hosting-container-standards","title":"Model Hosting Container Standards","description":"The `model-hosting-container-standards` is a Python toolkit designed to facilitate standardized model hosting container implementations, specifically with robust Amazon SageMaker integration. It provides utilities to enable efficient deployment and inference for models, including support for advanced frameworks like TensorRT-LLM and vLLM. Currently at version 0.1.14, the library is actively developed with frequent patch releases, indicating ongoing enhancements and maintenance.","status":"active","version":"0.1.14","language":"en","source_language":"en","source_url":"https://github.com/aws/model-hosting-container-standards","tags":["AWS","SageMaker","MLOps","Container","vLLM","TensorRT-LLM","Model Hosting","FastAPI","Inference"],"install":[{"cmd":"pip install model-hosting-container-standards","lang":"bash","label":"PyPI"}],"dependencies":[{"reason":"Web framework for container APIs","package":"starlette"},{"reason":"JSON query language for Python","package":"jmespath"},{"reason":"Modern, fast (high-performance) web framework, for building APIs","package":"fastapi"},{"reason":"Standard Python packaging tools","package":"setuptools"},{"reason":"Process control system for containers (e.g., managing inference servers)","package":"supervisor"},{"reason":"Asynchronous HTTP client","package":"httpx"},{"reason":"Data validation and settings management using Python type hints","package":"pydantic"}],"imports":[{"note":"A common component for defining custom model logic within a standardized container.","symbol":"ModelHandler","correct":"from model_hosting_container_standards.common.handler.model_handler import ModelHandler"},{"note":"While FastAPI is a dependency, the toolkit integrates with it for container APIs. Users might import it directly when building custom services using the toolkit's patterns.","symbol":"FastAPI","correct":"from fastapi import FastAPI"}],"quickstart":{"code":"import boto3\nimport os\n\nsagemaker_client = boto3.client('sagemaker')\n\n# Replace with your AWS account ID and region\naccount_id = os.environ.get('AWS_ACCOUNT_ID', '123456789012')\nregion = os.environ.get('AWS_REGION', 'us-east-1')\n\nmodel_name = 'my-vllm-standard-model'\nexecution_role_arn = os.environ.get('SAGEMAKER_EXECUTION_ROLE_ARN', 'arn:aws:iam::123456789012:role/SageMakerExecutionRole')\n\n# Example of using a vLLM container image that adheres to the standards\n# This image would typically be found in Amazon ECR Public Gallery or a private ECR repo\n# Note: This is an example, use an actual vLLM image URL from AWS ECR Public Gallery.\nvllm_image = f\"{account_id}.dkr.ecr.{region}.amazonaws.com/vllm:0.11.2-sagemaker-v1.2\"\n\nresponse = sagemaker_client.create_model(\n    ModelName=model_name,\n    ExecutionRoleArn=execution_role_arn,\n    PrimaryContainer={\n        'Image': vllm_image,\n        'Environment': {\n            'SM_VLLM_MODEL': 'meta-llama/Meta-Llama-3-8B-Instruct', # Hugging Face Model ID or S3 path\n            'HUGGING_FACE_HUB_TOKEN': os.environ.get('HUGGING_FACE_HUB_TOKEN', ''), # Securely provide token\n            'SM_VLLM_MAX_MODEL_LEN': '2048',\n            'SM_VLLM_GPU_MEMORY_UTILIZATION': '0.9',\n            'SM_VLLM_DTYPE': 'auto',\n            'SM_VLLM_TENSOR_PARALLEL_SIZE': '1'\n        }\n    }\n)\n\nprint(f\"Model creation initiated: {response['ModelArn']}\")\n# Further steps would involve creating an Endpoint Configuration and an Endpoint","lang":"python","description":"This quickstart demonstrates how to deploy a model using Amazon SageMaker, leveraging a container that adheres to the `model-hosting-container-standards`. It configures a SageMaker model with a vLLM-powered container image, setting crucial environment variables for model ID, resource allocation, and optional authentication tokens. This example assumes appropriate AWS credentials and SageMaker execution role are configured in your environment. Note that the toolkit itself is for *building* such containers, and this quickstart shows how to *consume* them on SageMaker."},"warnings":[{"fix":"Use environment variables (e.g., `os.environ.get('HUGGING_FACE_HUB_TOKEN', '')`) or integrate with cloud-native secret management services like AWS Secrets Manager.","message":"Avoid hardcoding sensitive information like API keys or Hugging Face tokens directly into container images or deployment scripts. Always use environment variables, AWS Secrets Manager, or other secure credential management systems for runtime injection.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Configure appropriate CPU and memory limits for your SageMaker endpoints or Docker containers (e.g., using `ContainerHostResourceLimits` in SageMaker or `--cpus`, `--memory` in Docker).","message":"Failing to set resource limits (CPU/Memory) for containers can lead to resource starvation, instability, and poor performance, especially in multi-container environments on a single instance.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always pin to specific, immutable image tags (e.g., `vllm:0.11.2-sagemaker-v1.2`) for production deployments to ensure consistent behavior and enable rollbacks.","message":"Relying on generic or 'latest' tags for container images in production deployments can introduce instability and make reproducibility difficult due to unexpected upstream changes.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Optimize Dockerfiles by using minimal base images (e.g., `alpine` variants), multi-stage builds, and `.dockerignore` files to exclude unnecessary build artifacts and development dependencies.","message":"Building overly large container images increases deployment times, storage costs, and the attack surface. This is a common issue with custom ML containers.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your development and deployment environments use Python 3.10 or a newer compatible version.","message":"The library explicitly requires Python 3.10 or newer. Deploying or developing with older Python versions (e.g., 3.9 or earlier) will lead to compatibility issues and failures.","severity":"breaking","affected_versions":"All versions >=0.1.0"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}