fasttext-numpy2
fasttext-numpy2 is a Python library that provides bindings for Facebook AI Research's fastText, focusing on compatibility with NumPy 2.x. The original fastText library is designed for efficient learning of word representations and sentence classification. This `fasttext-numpy2` fork specifically addresses a critical breaking change introduced by NumPy 2.0, allowing users to continue using fastText with newer NumPy versions. The current version is 0.10.4, and its release cadence is primarily driven by maintaining compatibility with its dependencies, especially NumPy.
Warnings
- breaking The original `fasttext` Python bindings are incompatible with NumPy 2.0 and newer versions, leading to a `ValueError: Unable to avoid copy while creating an array as requested`. The `fasttext-numpy2` library specifically provides the necessary patches to resolve this issue.
- deprecated The original Facebook Research `fastText` GitHub repository (github.com/facebookresearch/fastText) was set to a read-only archive on March 19, 2024, indicating that it is no longer actively maintained by Meta. While the core C++ library remains functional, new features or official patches are unlikely from the original source.
- gotcha FastText relies heavily on correctly preprocessed and encoded text. It assumes UTF-8 encoding, and inconsistent tokenization or encoding conventions between training and inference can significantly degrade model performance or lead to errors. Ensure all text data is consistently encoded and prepared.
- gotcha Model binary files (`.bin`) are highly sensitive to the specific library version and compilation settings used to train them. While `fasttext-numpy2` aims for drop-in compatibility, loading a model trained with a significantly different version of fastText (e.g., the original `fastText` vs. `fasttext-numpy2`, or different underlying C++ compiler versions) can lead to unexpected behavior or errors.
Install
-
pip install fasttext-numpy2
Imports
- fasttext
import fasttext
Quickstart
import fasttext
import os
# Create a dummy training data file
with open('data.txt', 'w') as f:
f.write('__label__sports This is a great game.\n')
f.write('__label__politics The election results are in.\n')
f.write('__label__sports I love playing basketball.\n')
f.write('__label__politics Debates are important for democracy.\n')
# Train a supervised model
model = fasttext.train_supervised('data.txt', epoch=5, lr=0.1, dim=100)
# Predict a label for a new text
text_to_predict = 'I watched a thrilling football match.'
labels, probabilities = model.predict(text_to_predict)
print(f"Text: '{text_to_predict}'")
print(f"Predicted label: {labels[0][0]}")
print(f"Probability: {probabilities[0]:.4f}")
# Clean up the dummy file
os.remove('data.txt')