Floret Python Bindings

0.10.5 · active · verified Thu Apr 16

Floret is an actively maintained Python library by Explosion (makers of spaCy) that provides compact, full-coverage word vectors using Bloom embeddings, extending the functionalities of fastText. It aims to reduce the size of vector tables significantly while maintaining performance, especially for morphologically rich languages and handling out-of-vocabulary words. The current version is 0.10.5, with a release cadence driven by Python version support and new features for its training functionalities.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to train an unsupervised floret model, retrieve word vectors, and save the trained model. It highlights the importance of setting `mode="floret"` to leverage floret's unique Bloom embeddings and shows how to save both the full model and the compact floret vector table.

import floret
import os

# Create a dummy data file for training
with open("data.txt", "w", encoding="utf-8") as f:
    f.write("This is a sample sentence for floret training.\n")
    f.write("Floret is great for compact word vectors.\n")
    f.write("More sentences for training the model.\n")

# Train an unsupervised floret model
# IMPORTANT: Use mode="floret" to enable floret's Bloom embeddings.
# The default mode="fasttext" trains original fastText vectors.
model = floret.train_unsupervised(
    "data.txt",
    model="cbow",
    mode="floret",
    hashCount=2,        # Recommended for floret mode
    bucket=50000,       # Reduced size hash table
    minn=3,
    maxn=6,
    dim=100,
    epoch=10
)

# Get a word vector
vector = model.get_word_vector("floret")
print(f"Vector for 'floret': {vector[:5]}...") # Print first 5 elements

# Save the full model (creates a .bin file)
model.save_model("vectors.bin")
print("Model saved to vectors.bin")

# Export the floret-specific vector table (creates a .floret file)
model.save_floret_vectors("vectors.floret")
print("Floret vectors saved to vectors.floret")

# Clean up dummy files
os.remove("data.txt")
os.remove("vectors.bin")
os.remove("vectors.floret")

view raw JSON →