CTranslate2
CTranslate2 is a C++ and Python library for efficient inference with Transformer models. It implements a custom runtime with performance optimizations like weights quantization, layers fusion, and batch reordering to accelerate and reduce memory usage of Transformer models on CPUs and GPUs. It currently supports a wide range of encoder-decoder, decoder-only, and encoder-only models from frameworks like OpenNMT, Fairseq, and Hugging Face Transformers. The library is actively maintained with frequent releases, currently at version 4.7.1.
Warnings
- breaking Python 3.8 support was dropped in CTranslate2 v4.6.0. Users on Python 3.8 or older must upgrade their Python environment to use v4.6.0 or newer.
- breaking CTranslate2 v4.5.0 and later require cuDNN 9 and are no longer compatible with cuDNN 8 for NVIDIA GPU acceleration. Users may encounter 'Could not load library libcudnn_ops_infer.so.8' errors.
- breaking Flash Attention support was removed from the Python package in CTranslate2 v4.4.0 due to significant package size increase with minimal performance gain. It remains supported in the C++ package with a specific build option.
- gotcha CTranslate2 v4.7.0 introduced compatibility with Transformers v5. Older versions of CTranslate2 might have issues when converting or inferring models from `transformers` library versions 5.x.
- gotcha During the release of v4.3.0, the PyPI package size exceeded the limit (20GB), leading to incomplete releases for Python 3.8 and 3.9. This was addressed in v4.3.1 and later versions.
Install
-
pip install ctranslate2 -
pip install ctranslate2 # Ensure CUDA 12.x and cuDNN 8/9 are installed separately for NVIDIA GPUs. -
pip install ctranslate2 --extra-index-url https://download.pytorch.org/whl/rocm6.0 # For AMD GPUs with ROCm 6.0+
Imports
- Translator
import ctranslate2 translator = ctranslate2.Translator(model_path)
- Generator
import ctranslate2 generator = ctranslate2.Generator(model_path)
- ct2-transformers-converter
ct2-transformers-converter --model facebook/m2m100_418M --output_dir ct2_model
Quickstart
# First, convert a model. This example uses a Hugging Face model.
# You would run this command in your terminal once:
# pip install transformers[torch]
# ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-de --output_dir opus-mt-en-de
import ctranslate2
import transformers
# Path to your converted CTranslate2 model directory
model_path = "opus-mt-en-de"
try:
# Initialize the CTranslate2 Translator
translator = ctranslate2.Translator(model_path, device="cpu") # Use device="cuda" for GPU
# Initialize the original tokenizer (e.g., from Hugging Face for tokenization)
tokenizer = transformers.AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
text_to_translate = "Hello world!"
# Encode the input text to tokens
input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(text_to_translate))
# CTranslate2 expects a batch of inputs, so wrap in a list
batch_inputs = [input_tokens]
# Perform translation
results = translator.translate_batch(batch_inputs)
# Decode the output tokens
output_tokens = results[0].hypotheses[0]
translated_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))
print(f"Original: {text_to_translate}")
print(f"Translated: {translated_text}")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure you have a model converted and located at 'opus-mt-en-de' ")
print("and that 'transformers' library is installed.")
print("For example, you can run: `pip install transformers[torch]` and then ")
print("`ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-de --output_dir opus-mt-en-de`")