AIPerf

0.7.0 · active · verified Thu Apr 16

AIPerf is a comprehensive benchmarking tool designed to measure the performance of generative AI models served by various inference solutions. It provides detailed metrics and extensive benchmark performance reports through a command-line interface. The library is actively maintained, with regular releases (4-12 per year), and the current version is 0.7.0.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates running a basic performance benchmark against a locally running Ollama server. It profiles a specified model using a chat endpoint with streaming enabled, a specific tokenizer, and defines concurrency and request count.

python3 -m venv venv
source venv/bin/activate
pip install aiperf

# Assuming an Ollama server is running locally with a model like 'granite4:350m'
# (e.g., via docker run -d --name ollama -p 11434:11434 -v ollama-data:/root/.ollama ollama/ollama:latest && docker exec -it ollama ollama pull granite4:350m)

aiperf profile \
  --model "granite4:350m" \
  --streaming \
  --endpoint-type chat \
  --tokenizer ibm-granite/granite-4.0-micro \
  --url http://localhost:11434 \
  --concurrency 5 \
  --request-count 10

view raw JSON →