{"id":1435,"library":"ctranslate2","title":"CTranslate2","description":"CTranslate2 is a C++ and Python library for efficient inference with Transformer models. It implements a custom runtime with performance optimizations like weights quantization, layers fusion, and batch reordering to accelerate and reduce memory usage of Transformer models on CPUs and GPUs. It currently supports a wide range of encoder-decoder, decoder-only, and encoder-only models from frameworks like OpenNMT, Fairseq, and Hugging Face Transformers. The library is actively maintained with frequent releases, currently at version 4.7.1.","status":"active","version":"4.7.1","language":"en","source_language":"en","source_url":"https://github.com/OpenNMT/CTranslate2","tags":["NLP","inference","Transformer","machine translation","LLM","quantization","speech recognition","GPU","CPU"],"install":[{"cmd":"pip install ctranslate2","lang":"bash","label":"Basic Installation (CPU)"},{"cmd":"pip install ctranslate2 # Ensure CUDA 12.x and cuDNN 8/9 are installed separately for NVIDIA GPUs.","lang":"bash","label":"GPU Installation (NVIDIA CUDA)"},{"cmd":"pip install ctranslate2 --extra-index-url https://download.pytorch.org/whl/rocm6.0 # For AMD GPUs with ROCm 6.0+","lang":"bash","label":"GPU Installation (AMD ROCm)"}],"dependencies":[{"reason":"Requires Python 3.9 or higher.","package":"python","optional":false},{"reason":"Commonly used for tokenization with CTranslate2 models (e.g., OpenNMT, OPUS-MT).","package":"sentencepiece","optional":true},{"reason":"Required for converting models from the Hugging Face Transformers library to CTranslate2 format.","package":"transformers","optional":true},{"reason":"Required for converting models trained with OpenNMT-py to CTranslate2 format.","package":"OpenNMT-py","optional":true},{"reason":"Needed for AMD GPU support with ROCm, PyTorch 2.1+ required. Also often used for model conversion workflows.","package":"torch","optional":true},{"reason":"NVIDIA CUDA Toolkit (12.x recommended) is required for NVIDIA GPU acceleration.","package":"cuda","optional":true},{"reason":"NVIDIA cuDNN (8 or 9, depending on CTranslate2 version) is recommended for optimal performance with convolutional layers on NVIDIA GPUs.","package":"cudnn","optional":true},{"reason":"AMD ROCm (6.0+) is required for AMD GPU acceleration.","package":"rocm","optional":true}],"imports":[{"note":"Main class for performing machine translation inference.","symbol":"Translator","correct":"import ctranslate2\ntranslator = ctranslate2.Translator(model_path)"},{"note":"Main class for performing text generation inference (e.g., with LLMs).","symbol":"Generator","correct":"import ctranslate2\ngenerator = ctranslate2.Generator(model_path)"},{"note":"Command-line tool for converting Hugging Face Transformers models. Run in shell, not Python.","symbol":"ct2-transformers-converter","correct":"ct2-transformers-converter --model facebook/m2m100_418M --output_dir ct2_model"}],"quickstart":{"code":"# First, convert a model. This example uses a Hugging Face model.\n# You would run this command in your terminal once:\n# pip install transformers[torch]\n# ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-de --output_dir opus-mt-en-de\n\nimport ctranslate2\nimport transformers\n\n# Path to your converted CTranslate2 model directory\nmodel_path = \"opus-mt-en-de\"\n\ntry:\n    # Initialize the CTranslate2 Translator\n    translator = ctranslate2.Translator(model_path, device=\"cpu\") # Use device=\"cuda\" for GPU\n\n    # Initialize the original tokenizer (e.g., from Hugging Face for tokenization)\n    tokenizer = transformers.AutoTokenizer.from_pretrained(\"Helsinki-NLP/opus-mt-en-de\")\n\n    text_to_translate = \"Hello world!\"\n\n    # Encode the input text to tokens\n    input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(text_to_translate))\n    # CTranslate2 expects a batch of inputs, so wrap in a list\n    batch_inputs = [input_tokens]\n\n    # Perform translation\n    results = translator.translate_batch(batch_inputs)\n\n    # Decode the output tokens\n    output_tokens = results[0].hypotheses[0]\n    translated_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))\n\n    print(f\"Original: {text_to_translate}\")\n    print(f\"Translated: {translated_text}\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Please ensure you have a model converted and located at 'opus-mt-en-de' \")\n    print(\"and that 'transformers' library is installed.\")\n    print(\"For example, you can run: `pip install transformers[torch]` and then \")\n    print(\"`ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-de --output_dir opus-mt-en-de`\")","lang":"python","description":"This quickstart demonstrates how to load a pre-converted model using `ctranslate2.Translator` and perform a basic text translation. It assumes a model has already been converted (e.g., from Hugging Face Transformers) and a tokenizer is available. For generation tasks, use `ctranslate2.Generator` instead."},"warnings":[{"fix":"Upgrade Python to version 3.9 or higher.","message":"Python 3.8 support was dropped in CTranslate2 v4.6.0. Users on Python 3.8 or older must upgrade their Python environment to use v4.6.0 or newer.","severity":"breaking","affected_versions":">=4.6.0"},{"fix":"Upgrade cuDNN to version 9.x if using NVIDIA GPUs, or downgrade CTranslate2 to a version prior to 4.5.0 if cuDNN 8 is strictly required.","message":"CTranslate2 v4.5.0 and later require cuDNN 9 and are no longer compatible with cuDNN 8 for NVIDIA GPU acceleration. Users may encounter 'Could not load library libcudnn_ops_infer.so.8' errors.","severity":"breaking","affected_versions":">=4.5.0"},{"fix":"If Flash Attention is critical, consider using the C++ library with the `WITH_FLASH_ATTN` build option or explore alternative solutions.","message":"Flash Attention support was removed from the Python package in CTranslate2 v4.4.0 due to significant package size increase with minimal performance gain. It remains supported in the C++ package with a specific build option.","severity":"breaking","affected_versions":">=4.4.0"},{"fix":"Upgrade CTranslate2 to version 4.7.0 or higher for full compatibility with Transformers v5.","message":"CTranslate2 v4.7.0 introduced compatibility with Transformers v5. Older versions of CTranslate2 might have issues when converting or inferring models from `transformers` library versions 5.x.","severity":"gotcha","affected_versions":"<4.7.0"},{"fix":"Avoid CTranslate2 version 4.3.0. Use 4.3.1 or a later version instead.","message":"During the release of v4.3.0, the PyPI package size exceeded the limit (20GB), leading to incomplete releases for Python 3.8 and 3.9. This was addressed in v4.3.1 and later versions.","severity":"gotcha","affected_versions":"4.3.0"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}