fastText Python Bindings
fastText is a library for efficient learning of word representations and sentence classification. Developed by Facebook AI Research, it's particularly good for large-scale text processing tasks. The current version is 0.9.3, with releases focusing on new features, performance, and API stability rather than a fixed cadence.
Warnings
- breaking Version 0.9.1 merged the previously separate official 'fastText' (from GitHub) and unofficial 'fasttext' (from PyPI) Python modules. This involved significant API changes, especially for users who were previously installing directly from GitHub.
- gotcha Installation on some operating systems (e.g., Windows, or macOS without specific tools) can fail due to C++ compilation requirements. fastText is a C++ library with Python bindings.
- gotcha The `fasttext.load_model()` function expects a `.bin` model file, which contains both word vectors and classification information. It cannot directly load standalone `.vec` (vector) files.
- gotcha Training data for supervised classification (e.g., `train_supervised`) must adhere to a specific format: each line should contain the label prefixed with `__label__`, followed by the text, for example: `__label__positive This is a great product.`
Install
-
pip install fasttext
Imports
- fasttext
import fasttext
Quickstart
import fasttext
import os
# Create a dummy training file for demonstration
training_data_path = 'train.txt'
with open(training_data_path, 'w') as f:
f.write('__label__positive This is a good movie.\n')
f.write('__label__negative This movie was terrible.\n')
f.write('__label__positive I love this film.\n')
# Train a supervised model
model = fasttext.train_supervised(input=training_data_path)
# Predict a label
text_to_predict = 'This is an excellent film.'
predictions = model.predict(text_to_predict)
print(f"Text: '{text_to_predict}'")
print(f"Prediction: {predictions[0][0]}, Probability: {predictions[1][0]:.4f}")
# Clean up dummy file
os.remove(training_data_path)