CTranslate2

4.7.1 · active · verified Thu Apr 09

CTranslate2 is a C++ and Python library for efficient inference with Transformer models. It implements a custom runtime with performance optimizations like weights quantization, layers fusion, and batch reordering to accelerate and reduce memory usage of Transformer models on CPUs and GPUs. It currently supports a wide range of encoder-decoder, decoder-only, and encoder-only models from frameworks like OpenNMT, Fairseq, and Hugging Face Transformers. The library is actively maintained with frequent releases, currently at version 4.7.1.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a pre-converted model using `ctranslate2.Translator` and perform a basic text translation. It assumes a model has already been converted (e.g., from Hugging Face Transformers) and a tokenizer is available. For generation tasks, use `ctranslate2.Generator` instead.

# First, convert a model. This example uses a Hugging Face model.
# You would run this command in your terminal once:
# pip install transformers[torch]
# ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-de --output_dir opus-mt-en-de

import ctranslate2
import transformers

# Path to your converted CTranslate2 model directory
model_path = "opus-mt-en-de"

try:
    # Initialize the CTranslate2 Translator
    translator = ctranslate2.Translator(model_path, device="cpu") # Use device="cuda" for GPU

    # Initialize the original tokenizer (e.g., from Hugging Face for tokenization)
    tokenizer = transformers.AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")

    text_to_translate = "Hello world!"

    # Encode the input text to tokens
    input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(text_to_translate))
    # CTranslate2 expects a batch of inputs, so wrap in a list
    batch_inputs = [input_tokens]

    # Perform translation
    results = translator.translate_batch(batch_inputs)

    # Decode the output tokens
    output_tokens = results[0].hypotheses[0]
    translated_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))

    print(f"Original: {text_to_translate}")
    print(f"Translated: {translated_text}")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure you have a model converted and located at 'opus-mt-en-de' ")
    print("and that 'transformers' library is installed.")
    print("For example, you can run: `pip install transformers[torch]` and then ")
    print("`ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-de --output_dir opus-mt-en-de`")

view raw JSON →