FastText Python Bindings
FastText is an open-source, lightweight library developed by Facebook AI Research for efficient learning of word embeddings and text classification. The `fasttext-wheel` package provides pre-compiled Python bindings for the core FastText C++ library, streamlining installation. The current version is 0.9.2, with releases being somewhat infrequent but active, focusing on core improvements and broader access.
Warnings
- breaking The v0.9.1 release consolidated the official GitHub `fastText` module and the unofficial PyPI `fasttext` module. Users migrating from the *old official GitHub module* (which might have used `import fastText`) must now use `import fasttext`.
- gotcha Supervised learning (`train_supervised`) requires input data to be formatted with `__label__` prefixes. Each line must contain `__label__<label_name> <text_content>`.
- gotcha The `model.predict()` method returns a tuple containing two lists: `([['label']], [array([probability])])`. Users often incorrectly expect a single string or float.
- gotcha FastText models, especially with large vocabularies or many n-grams, can consume significant amounts of RAM during training and when loaded, potentially leading to out-of-memory errors on systems with limited resources.
- deprecated The v0.2.0 release introduced a 'beta C++ API', deprecating some methods and moving functionality. While primarily a C++ change, it signaled potential future changes in Python binding behavior or available methods.
Install
-
pip install fasttext-wheel
Imports
- fasttext
import fasttext
Quickstart
import fasttext
import os
# Create a dummy training data file (replace with your actual data)
# Format: __label__label1 text1
# __label__label2 text2
train_file = "train.txt"
with open(train_file, "w") as f:
f.write("__label__positive this movie is great\n")
f.write("__label__negative this movie is terrible\n")
f.write("__label__positive i love this film\n")
f.write("__label__negative what a waste of time\n")
# Train a supervised text classification model
# Adjust parameters like epoch, lr, wordNgrams for your specific task
model = fasttext.train_supervised(input=train_file, epoch=25, lr=1.0, wordNgrams=2)
# Predict labels for new text
print("Prediction for 'this movie is wonderful':", model.predict("this movie is wonderful"))
print("Prediction for 'worst movie ever':", model.predict("worst movie ever"))
# Optionally save and load the model
model_path = "model.bin"
model.save_model(model_path)
loaded_model = fasttext.load_model(model_path)
print("Loaded model prediction:", loaded_model.predict("this film is amazing"))
# Clean up dummy file
os.remove(train_file)
os.remove(model_path)