Mistral Common

1.11.0 · active · verified Thu Apr 09

Mistral-common is a Python library providing essential utilities for working with Mistral AI models. It encompasses tools for tokenization of text, images, and tool calls, as well as validation and normalization of requests, messages, tool calls, and responses. Built upon Pydantic, it ensures robust data handling for AI interactions. The library is actively maintained, currently at version 1.11.0, and receives regular updates to support new model features and improvements.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to tokenize a simple chat completion request using `mistral-common`. It involves defining user messages, creating a `ChatCompletionRequest`, and then encoding it into token IDs using a `MistralTokenizer`. For a real scenario, ensure the tokenizer model is correctly loaded, typically from a model name or a local file. The example includes a fallback for demonstration if a live tokenizer cannot be loaded.

from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
import os

# NOTE: For actual inference, you would typically load a model tokenizer 
# from a path or Hugging Face Hub. This example uses a placeholder.
# A real model_name would be 'mistral-large-latest' or 'open-mixtral-8x22b'
# For local development or specific tokenizer versions, a path can be used.
# For the purpose of a quickstart demonstration without an actual model download,
# we'll simulate the tokenizer loading or use a common one if available without heavy downloads.
# In a real scenario, you might do:
# tokenizer = MistralTokenizer.from_model("open-mixtral-8x22b")
# Or, if you have a local tokenizer:
# tokenizer = MistralTokenizer.from_file("path/to/tokenizer.model")

# For this quickstart, we will attempt to load a tokenizer that is generally available
# or illustrate the process. Assuming a tokenizer object can be instantiated for tokenization.
# In a production environment, ensure the tokenizer model is correctly loaded.

try:
    # Attempt to load a common tokenizer for demonstration. 
    # 'open-mixtral-8x22b' is often used in examples.
    tokenizer = MistralTokenizer.from_model("open-mixtral-8x22b")
except Exception as e:
    print(f"Could not load tokenizer directly (e.g., model not found or network issue): {e}")
    print("Please ensure you have the tokenizer model available or use a local path if preferred.")
    print("For this quickstart, we will use a dummy tokenizer for demonstration purposes only.")

    # Fallback to a dummy tokenizer if real one fails for demonstration
    class DummyTokenizer:
        def encode_chat_completion(self, request):
            print("Using dummy tokenizer. No actual tokenization performed.")
            return [1, 2, 3, 4, 5] # Dummy token IDs

    tokenizer = DummyTokenizer()

messages = [
    UserMessage(content="What is the capital of France?")
]

chat_completion_request = ChatCompletionRequest(messages=messages)

# Tokenize the chat completion request
token_ids = tokenizer.encode_chat_completion(chat_completion_request)

print(f"Original messages: {messages}")
print(f"Token IDs: {token_ids}")

view raw JSON →