Roboflow Inference CLI
Roboflow Inference CLI is a command-line interface designed for deploying computer vision models to various devices and environments with minimal machine learning or deployment knowledge. It provides tools to run and manage a local inference server, process data with workflows, benchmark performance, make predictions, and deploy to the cloud. The library is currently at version 1.2.2 and sees active development with frequent releases.
Warnings
- breaking Starting with v1.2.0, `inference-models` became the default inference engine. This change impacts performance, resource usage, and may require adjustments for GPU users. The old backend is available in opt-out mode.
- deprecated Python 3.9 support has been deprecated and is now effectively End-of-Life. Building projects with Python 3.9 and `inference-cli` may lead to build failures or unpatched security vulnerabilities.
- gotcha Running the local inference server using `inference server start` requires Docker to be installed and running on your system. Without Docker, the server cannot be launched.
- gotcha Proper GPU setup for `inference-gpu` is complex, requiring specific NVIDIA CUDA Toolkit and cuDNN installations, and careful selection of `torch` and `torchvision` versions that match your CUDA installation. Incorrect versions can lead to runtime errors or CPU-only inference.
- gotcha When performing programmatic inference with `inference_sdk.InferenceHTTPClient` or other SDK components, an `ROBOFLOW_API_KEY` (or `API_KEY`) is typically required for authentication, especially when interacting with Roboflow's hosted services.
Install
-
pip install inference-cli -
pip install inference-cli # For GPU (CUDA 12.1 example, adjust --extra-index-url for your CUDA version) pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 pip install inference-gpu
Imports
- InferenceHTTPClient
from inference_sdk import InferenceHTTPClient
- InferencePipeline
from inference import InferencePipeline
Quickstart
import os
from inference_sdk import InferenceHTTPClient
# Ensure you have your Roboflow API key set as an environment variable or replace os.environ.get with your key.
# You can find your API key on the Roboflow dashboard.
ROBOFLOW_API_KEY = os.environ.get('ROBOFLOW_API_KEY', '')
if not ROBOFLOW_API_KEY:
print("Warning: ROBOFLOW_API_KEY environment variable not set. Inference may fail.")
# For a quick demo without a real key, you might use a dummy value
# or skip this part if you are only running a local server without Roboflow API interaction.
# For proper usage, always use a real key.
# 1. Start a local inference server (requires Docker to be running):
# Run in your terminal: inference server start
# This will typically start on http://localhost:9001
# 2. Initialize the InferenceHTTPClient
client = InferenceHTTPClient(
api_url="http://localhost:9001", # Or "https://serverless.roboflow.com" for hosted API
api_key=ROBOFLOW_API_KEY,
)
# Example image URL for inference
image_url = "https://media.roboflow.com/inference/soccer.jpg"
# Replace with your actual model_id (e.g., 'your-project-name/your-model-version')
# You can find this on your Roboflow model's deploy tab.
model_id = "soccer-players-5fuqs/1"
# 3. Perform inference
try:
print(f"Running inference on {image_url} with model {model_id}...")
results = client.infer(image_url, model_id=model_id)
print("Inference successful!")
# Print first few predictions for brevity
if results and 'predictions' in results and len(results['predictions']) > 0:
print("First 3 predictions:")
for i, pred in enumerate(results['predictions'][:3]):
print(f" - Class: {pred.get('class')}, Confidence: {pred.get('confidence'):.2f}")
else:
print("No predictions found or unexpected result format.")
except Exception as e:
print(f"An error occurred during inference: {e}")
print("Ensure the local inference server is running ('inference server start') and the model ID/API key are correct.")