fastText Predict
fasttext-predict is a Python package that provides a lightweight, standalone implementation of fastText's prediction functionality. It is specifically designed to include only the `predict` method, making it compact (<1MB) and free of external dependencies, including NumPy. This library aims to provide pre-built wheels for various architectures, ensuring easy installation for deployment scenarios where a full fastText installation is not desired. It is actively maintained with frequent minor updates, offering a stable solution for inference.
Warnings
- gotcha This library provides *only* the prediction functionality (`predict` method) of fastText. Training, model quantization, word/sentence vector generation, or other utilities available in the full `fasttext` library are explicitly *not* included. Attempting to call non-prediction methods will result in an `AttributeError`.
- gotcha The Python package name for installation is `fasttext-predict`, but the correct Python import statement is `import fasttext`. Users accustomed to other `pip install X` -> `import X` patterns should be aware of this difference to avoid `ModuleNotFoundError`.
- breaking Models used with `fasttext-predict` are typically trained with the original `fastText` library (facebookresearch/fastText), which was set to a read-only archive on March 19, 2024. While `fasttext-predict` is actively maintained for prediction, users should be aware of the upstream project's archived status and consider potential long-term implications for model format compatibility or training new models with actively developed forks of `fastText`.
- gotcha The `predict` method expects clean, single-line text input. Newline characters (`\n`) or other special formatting within input strings can lead to incorrect predictions or errors, especially when processing text from sources like dataframes. Each prediction input should ideally be a single, pre-processed string.
- gotcha By default, `model.predict()` returns only the single top predicted label and its corresponding probability. To retrieve multiple labels (e.g., the top-k most likely labels) and their probabilities for a given text, the `k` parameter must be explicitly passed to the `predict` method.
Install
-
pip install fasttext-predict
Imports
- fasttext
import fasttext
Quickstart
# A fastText model file (e.g., for language identification) needs to be downloaded separately.
# Example model: lid.176.ftz from https://fasttext.cc/docs/en/language-identification.html
import fasttext
import os
# Ensure the model file is accessible, e.g., placed in the current directory
model_path = os.environ.get('FASTTEXT_MODEL_PATH', 'lid.176.ftz')
try:
model = fasttext.load_model(model_path)
text_to_predict = 'Fondant au chocolat et tarte aux myrtilles'
predictions = model.predict(text_to_predict)
print(f"Text: '{text_to_predict}'")
print(f"Predicted label(s): {predictions[0]}")
print(f"Probabilities: {predictions[1]}")
# To get top-k predictions with probabilities
top_k_predictions = model.predict(text_to_predict, k=2)
print(f"\nTop 2 Predicted label(s): {top_k_predictions[0]}")
print(f"Top 2 Probabilities: {top_k_predictions[1]}")
except ValueError as e:
print(f"Error loading model or making prediction: {e}")
print("Please ensure the model file is downloaded and the path is correct.")
except FileNotFoundError:
print(f"Error: Model file not found at '{model_path}'.")
print("Please download 'lid.176.ftz' (or your target model) and place it correctly, or set FASTTEXT_MODEL_PATH environment variable.")