{"id":1844,"library":"mistral-common","title":"Mistral Common","description":"Mistral-common is a Python library providing essential utilities for working with Mistral AI models. It encompasses tools for tokenization of text, images, and tool calls, as well as validation and normalization of requests, messages, tool calls, and responses. Built upon Pydantic, it ensures robust data handling for AI interactions. The library is actively maintained, currently at version 1.11.0, and receives regular updates to support new model features and improvements.","status":"active","version":"1.11.0","language":"en","source_language":"en","source_url":"https://github.com/mistralai/mistral-common","tags":["AI","LLM","Mistral AI","tokenization","validation","pydantic","natural language processing"],"install":[{"cmd":"pip install mistral-common","lang":"bash","label":"Basic Installation"},{"cmd":"pip install \"mistral-common[image,audio,hf-hub,sentencepiece,server]\"","lang":"bash","label":"Installation with all optional dependencies"}],"dependencies":[{"reason":"Core library for data validation and settings management, foundational to mistral-common's data structures.","package":"pydantic"},{"reason":"Required for schema validation.","package":"jsonschema"},{"reason":"A fundamental package for scientific computing with Python.","package":"numpy"},{"reason":"Required for image processing utilities.","package":"pillow"},{"reason":"Provides additional Pydantic types.","package":"pydantic-extra-types"},{"reason":"Used for making HTTP requests, e.g., downloading tokenizers.","package":"requests"},{"reason":"A fast BPE tokenizer.","package":"tiktoken"},{"reason":"Backports of new type hints in older Python versions.","package":"typing-extensions"},{"reason":"A fast, drop-in replacement for the default asyncio event loop.","package":"uvloop"},{"reason":"Optional: to download tokenizers from the Hugging Face Hub.","package":"huggingface-hub"},{"reason":"Optional: to allow the use of SentencePiece tokenizers (now less common for new models).","package":"sentencepiece"},{"reason":"Optional: for the experimental REST API server.","package":"fastapi"}],"imports":[{"symbol":"MistralTokenizer","correct":"from mistral_common.tokens.tokenizers.mistral import MistralTokenizer"},{"symbol":"ChatCompletionRequest","correct":"from mistral_common.protocol.instruct.request import ChatCompletionRequest"},{"symbol":"UserMessage","correct":"from mistral_common.protocol.instruct.messages import UserMessage"},{"symbol":"Tool","correct":"from mistral_common.protocol.instruct.tool_calls import Tool"},{"symbol":"Function","correct":"from mistral_common.protocol.instruct.tool_calls import Function"},{"note":"Directly importing internal modules like `MistralCommonBackend` from `transformers` or other libraries is discouraged as internal paths are unstable and can break. Rely on the `AutoTokenizer` or the official `mistral-common` import paths for stability.","wrong":"from transformers.models.mistral.tokenization_mistral_common import MistralCommonBackend","symbol":"MistralCommonBackend","correct":"from mistral_common.tokens.tokenizers.mistral import MistralCommonBackend"}],"quickstart":{"code":"from mistral_common.protocol.instruct.messages import UserMessage\nfrom mistral_common.protocol.instruct.request import ChatCompletionRequest\nfrom mistral_common.tokens.tokenizers.mistral import MistralTokenizer\nimport os\n\n# NOTE: For actual inference, you would typically load a model tokenizer \n# from a path or Hugging Face Hub. This example uses a placeholder.\n# A real model_name would be 'mistral-large-latest' or 'open-mixtral-8x22b'\n# For local development or specific tokenizer versions, a path can be used.\n# For the purpose of a quickstart demonstration without an actual model download,\n# we'll simulate the tokenizer loading or use a common one if available without heavy downloads.\n# In a real scenario, you might do:\n# tokenizer = MistralTokenizer.from_model(\"open-mixtral-8x22b\")\n# Or, if you have a local tokenizer:\n# tokenizer = MistralTokenizer.from_file(\"path/to/tokenizer.model\")\n\n# For this quickstart, we will attempt to load a tokenizer that is generally available\n# or illustrate the process. Assuming a tokenizer object can be instantiated for tokenization.\n# In a production environment, ensure the tokenizer model is correctly loaded.\n\ntry:\n    # Attempt to load a common tokenizer for demonstration. \n    # 'open-mixtral-8x22b' is often used in examples.\n    tokenizer = MistralTokenizer.from_model(\"open-mixtral-8x22b\")\nexcept Exception as e:\n    print(f\"Could not load tokenizer directly (e.g., model not found or network issue): {e}\")\n    print(\"Please ensure you have the tokenizer model available or use a local path if preferred.\")\n    print(\"For this quickstart, we will use a dummy tokenizer for demonstration purposes only.\")\n\n    # Fallback to a dummy tokenizer if real one fails for demonstration\n    class DummyTokenizer:\n        def encode_chat_completion(self, request):\n            print(\"Using dummy tokenizer. No actual tokenization performed.\")\n            return [1, 2, 3, 4, 5] # Dummy token IDs\n\n    tokenizer = DummyTokenizer()\n\nmessages = [\n    UserMessage(content=\"What is the capital of France?\")\n]\n\nchat_completion_request = ChatCompletionRequest(messages=messages)\n\n# Tokenize the chat completion request\ntoken_ids = tokenizer.encode_chat_completion(chat_completion_request)\n\nprint(f\"Original messages: {messages}\")\nprint(f\"Token IDs: {token_ids}\")","lang":"python","description":"This quickstart demonstrates how to tokenize a simple chat completion request using `mistral-common`. It involves defining user messages, creating a `ChatCompletionRequest`, and then encoding it into token IDs using a `MistralTokenizer`. For a real scenario, ensure the tokenizer model is correctly loaded, typically from a model name or a local file. The example includes a fallback for demonstration if a live tokenizer cannot be loaded."},"warnings":[{"fix":"Upgrade to mistral-common >= 1.8.4.","message":"Applications relying on strict parsing of streamed chunks may break due to a security-related change that added a new 'p' parameter to chunks. Update `mistral-common` to version 1.8.4 or higher to mitigate this.","severity":"breaking","affected_versions":"<1.8.4"},{"fix":"Use public API endpoints and classes as documented. Avoid deep imports from internal submodules unless explicitly instructed by official documentation.","message":"Directly importing internal modules (e.g., `MistralCommonBackend`) from `mistral-common` or other related libraries (like `transformers`) is generally discouraged. Internal file structures in fast-paced open-source libraries can change frequently, making your codebase brittle to minor updates. It is safer to rely on public APIs like `AutoTokenizer` when integrating with libraries such as Hugging Face Transformers.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always check the official Mistral AI documentation or the `mistral-common` GitHub repository for recommended `mistral-common` versions compatible with specific Mistral models. Upgrade `mistral-common` when using new model releases.","message":"Tokenizer versions are closely tied to Mistral model versions. Using an older `mistral-common` version with newer models or vice-versa might lead to incorrect tokenization or unexpected behavior. Verify compatibility between your `mistral-common` version and the Mistral model you intend to use.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For new models, consider using `Tekken` tokenizers. If you require `sentencepiece`, ensure you install `mistral-common[sentencepiece]`.","message":"The `sentencepiece` tokenizer is now optional, as Mistral AI primarily releases `Tekken` tokenizers for recent models. While `sentencepiece` support is still available via an optional dependency, new projects might benefit from focusing on `Tekken`-based tokenization if working with the latest models.","severity":"deprecated","affected_versions":">=1.3.1 (when Tekkenizer support was added)"},{"fix":"Refer to the `mistral-common` documentation for specific behavior of its tokenizer, especially concerning special tokens and sequence handling, when used with Hugging Face interfaces.","message":"When using `mistral-common`'s tokenizer via a Hugging Face `PreTrainedTokenizerBase` compatible interface, be aware of key behavioral differences. Special tokens are not encoded directly, and pairs of sequences are not supported. This can lead to unexpected results if relying on standard Hugging Face `PreTrainedTokenizer` behavior.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}