{"id":5259,"library":"inference-cli","title":"Roboflow Inference CLI","description":"Roboflow Inference CLI is a command-line interface designed for deploying computer vision models to various devices and environments with minimal machine learning or deployment knowledge. It provides tools to run and manage a local inference server, process data with workflows, benchmark performance, make predictions, and deploy to the cloud. The library is currently at version 1.2.2 and sees active development with frequent releases.","status":"active","version":"1.2.2","language":"en","source_language":"en","source_url":"https://github.com/roboflow/inference","tags":["computer vision","machine learning","cli","inference","roboflow","object detection","segmentation","classification","deep learning"],"install":[{"cmd":"pip install inference-cli","lang":"bash","label":"Basic Installation"},{"cmd":"pip install inference-cli\n# For GPU (CUDA 12.1 example, adjust --extra-index-url for your CUDA version)\npip install torch torchvision --index-url https://download.pytorch.org/whl/cu121\npip install inference-gpu","lang":"bash","label":"GPU Installation (with PyTorch and CUDA)"}],"dependencies":[{"reason":"Required to run the local inference server (`inference server start`) as it pulls and manages Docker images.","package":"Docker","optional":false},{"reason":"Required for GPU inference with the `inference-models` backend. Specific CUDA-compatible versions are necessary.","package":"torch","optional":true},{"reason":"Often installed alongside `torch` for computer vision tasks, particularly with GPU inference.","package":"torchvision","optional":true},{"reason":"A dependency for the `inference-models` backend when leveraging NVIDIA GPUs.","package":"pycuda","optional":true}],"imports":[{"note":"Used for programmatic interaction with an Inference Server (local or hosted) over HTTP.","symbol":"InferenceHTTPClient","correct":"from inference_sdk import InferenceHTTPClient"},{"note":"Used for Python-native, direct inference without Docker, especially for video streams.","symbol":"InferencePipeline","correct":"from inference import InferencePipeline"}],"quickstart":{"code":"import os\nfrom inference_sdk import InferenceHTTPClient\n\n# Ensure you have your Roboflow API key set as an environment variable or replace os.environ.get with your key.\n# You can find your API key on the Roboflow dashboard.\nROBOFLOW_API_KEY = os.environ.get('ROBOFLOW_API_KEY', '')\n\nif not ROBOFLOW_API_KEY:\n    print(\"Warning: ROBOFLOW_API_KEY environment variable not set. Inference may fail.\")\n    # For a quick demo without a real key, you might use a dummy value\n    # or skip this part if you are only running a local server without Roboflow API interaction.\n    # For proper usage, always use a real key.\n\n# 1. Start a local inference server (requires Docker to be running):\n#    Run in your terminal: inference server start\n#    This will typically start on http://localhost:9001\n\n# 2. Initialize the InferenceHTTPClient\nclient = InferenceHTTPClient(\n    api_url=\"http://localhost:9001\",  # Or \"https://serverless.roboflow.com\" for hosted API\n    api_key=ROBOFLOW_API_KEY,\n)\n\n# Example image URL for inference\nimage_url = \"https://media.roboflow.com/inference/soccer.jpg\"\n\n# Replace with your actual model_id (e.g., 'your-project-name/your-model-version')\n# You can find this on your Roboflow model's deploy tab.\nmodel_id = \"soccer-players-5fuqs/1\"\n\n# 3. Perform inference\ntry:\n    print(f\"Running inference on {image_url} with model {model_id}...\")\n    results = client.infer(image_url, model_id=model_id)\n    print(\"Inference successful!\")\n    # Print first few predictions for brevity\n    if results and 'predictions' in results and len(results['predictions']) > 0:\n        print(\"First 3 predictions:\")\n        for i, pred in enumerate(results['predictions'][:3]):\n            print(f\"  - Class: {pred.get('class')}, Confidence: {pred.get('confidence'):.2f}\")\n    else:\n        print(\"No predictions found or unexpected result format.\")\nexcept Exception as e:\n    print(f\"An error occurred during inference: {e}\")\n    print(\"Ensure the local inference server is running ('inference server start') and the model ID/API key are correct.\")","lang":"python","description":"This quickstart demonstrates how to perform inference programmatically using the `inference_sdk.InferenceHTTPClient`. It assumes a local inference server is running (started via `inference server start` in the terminal, which requires Docker) or uses the Roboflow hosted API. It takes an image URL and a model ID, then prints the inference results. Ensure your `ROBOFLOW_API_KEY` is set as an environment variable."},"warnings":[{"fix":"To continue using the old inference backend, set the environment variable `USE_INFERENCE_MODELS=False`. For GPU users, ensure `torch` and `torchvision` are installed *before* `inference-gpu` with versions compatible with your CUDA toolkit.","message":"Starting with v1.2.0, `inference-models` became the default inference engine. This change impacts performance, resource usage, and may require adjustments for GPU users. The old backend is available in opt-out mode.","severity":"breaking","affected_versions":">=1.2.0"},{"fix":"Upgrade your Python environment to Python 3.10 or newer (up to <3.13) as specified by the `requires_python` metadata.","message":"Python 3.9 support has been deprecated and is now effectively End-of-Life. Building projects with Python 3.9 and `inference-cli` may lead to build failures or unpatched security vulnerabilities.","severity":"deprecated","affected_versions":">=1.1.0"},{"fix":"Install Docker Desktop (or equivalent) for your operating system and ensure it's running before executing `inference server start`.","message":"Running the local inference server using `inference server start` requires Docker to be installed and running on your system. Without Docker, the server cannot be launched.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Refer to the Roboflow documentation or PyTorch installation guide to ensure you install the correct CUDA Toolkit, cuDNN, and then `torch`, `torchvision`, and `inference-gpu` packages that are compatible with each other and your hardware.","message":"Proper GPU setup for `inference-gpu` is complex, requiring specific NVIDIA CUDA Toolkit and cuDNN installations, and careful selection of `torch` and `torchvision` versions that match your CUDA installation. Incorrect versions can lead to runtime errors or CPU-only inference.","severity":"gotcha","affected_versions":"All versions with GPU usage"},{"fix":"Ensure your `ROBOFLOW_API_KEY` is set as an environment variable or passed directly to the client constructor. Obtain your API key from the Roboflow dashboard.","message":"When performing programmatic inference with `inference_sdk.InferenceHTTPClient` or other SDK components, an `ROBOFLOW_API_KEY` (or `API_KEY`) is typically required for authentication, especially when interacting with Roboflow's hosted services.","severity":"gotcha","affected_versions":"All versions with programmatic API usage"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}