{"id":8175,"library":"floret","title":"Floret Python Bindings","description":"Floret is an actively maintained Python library by Explosion (makers of spaCy) that provides compact, full-coverage word vectors using Bloom embeddings, extending the functionalities of fastText. It aims to reduce the size of vector tables significantly while maintaining performance, especially for morphologically rich languages and handling out-of-vocabulary words. The current version is 0.10.5, with a release cadence driven by Python version support and new features for its training functionalities.","status":"active","version":"0.10.5","language":"en","source_language":"en","source_url":"https://github.com/explosion/floret","tags":["NLP","embeddings","fastText","spaCy","machine learning","vectors","subword","bloom filters"],"install":[{"cmd":"pip install floret","lang":"bash","label":"Install from PyPI"}],"dependencies":[{"reason":"Required Python version.","package":"python","version":">=3.6","optional":false},{"reason":"Used for Python C++ bindings.","package":"pybind11","optional":false},{"reason":"Numerical operations.","package":"numpy","optional":false},{"reason":"Scientific computing.","package":"scipy","optional":false}],"imports":[{"note":"The primary way to import the library.","symbol":"floret","correct":"import floret"}],"quickstart":{"code":"import floret\nimport os\n\n# Create a dummy data file for training\nwith open(\"data.txt\", \"w\", encoding=\"utf-8\") as f:\n    f.write(\"This is a sample sentence for floret training.\\n\")\n    f.write(\"Floret is great for compact word vectors.\\n\")\n    f.write(\"More sentences for training the model.\\n\")\n\n# Train an unsupervised floret model\n# IMPORTANT: Use mode=\"floret\" to enable floret's Bloom embeddings.\n# The default mode=\"fasttext\" trains original fastText vectors.\nmodel = floret.train_unsupervised(\n    \"data.txt\",\n    model=\"cbow\",\n    mode=\"floret\",\n    hashCount=2,        # Recommended for floret mode\n    bucket=50000,       # Reduced size hash table\n    minn=3,\n    maxn=6,\n    dim=100,\n    epoch=10\n)\n\n# Get a word vector\nvector = model.get_word_vector(\"floret\")\nprint(f\"Vector for 'floret': {vector[:5]}...\") # Print first 5 elements\n\n# Save the full model (creates a .bin file)\nmodel.save_model(\"vectors.bin\")\nprint(\"Model saved to vectors.bin\")\n\n# Export the floret-specific vector table (creates a .floret file)\nmodel.save_floret_vectors(\"vectors.floret\")\nprint(\"Floret vectors saved to vectors.floret\")\n\n# Clean up dummy files\nos.remove(\"data.txt\")\nos.remove(\"vectors.bin\")\nos.remove(\"vectors.floret\")\n","lang":"python","description":"This quickstart demonstrates how to train an unsupervised floret model, retrieve word vectors, and save the trained model. It highlights the importance of setting `mode=\"floret\"` to leverage floret's unique Bloom embeddings and shows how to save both the full model and the compact floret vector table."},"warnings":[{"fix":"Always load `.bin` files with the same program (floret or fastText) that was used to train and save them. For floret-specific compact vectors, use `model.save_floret_vectors()` and load these with spaCy's `spacy init vectors` command.","message":"The binary formats (`.bin` files) saved by `floret` are not compatible with binary models saved by original `fastText` and vice-versa.","severity":"breaking","affected_versions":"All versions"},{"fix":"When calling `floret.train_unsupervised()` or `floret.train_supervised()`, include the argument `mode='floret'`.","message":"By default, `floret.train_unsupervised()` and `floret.train_supervised()` use `mode='fasttext'`, which trains and saves original fastText vectors. To leverage floret's Bloom embeddings for compact vectors, you must explicitly set `mode='floret'` during training.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure all training data is available for a single training run. There is no documented API for incremental training or loading existing embeddings as a starting point.","message":"It is not currently possible to train floret models iteratively or from pre-trained embeddings directly through the Python API.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be aware of these behavioral changes. For similarity, consider custom implementations or direct `floret` model queries if not using spaCy's built-in similarity (which usually relies on `Token.vector`).","message":"When integrating floret vectors into spaCy, certain spaCy `Token` attributes and `Vocab` methods behave differently due to the subword embedding nature. Specifically, `token.is_oov` will always be `False` and `nlp.vocab.vectors.most_similar` might not be supported or could throw an error.","severity":"gotcha","affected_versions":"spaCy v3.2+"},{"fix":"If your goal is to obtain the compact Bloom-embedding vectors, use `model.save_floret_vectors(\"your_vectors.floret\")`. This generates a `.floret` file that is significantly smaller than the `.bin` model file.","message":"The method `model.save_model(\"file.bin\")` saves the full floret model, which can be large. To get the highly compact floret vector table for use in applications like spaCy, a separate method `model.save_floret_vectors(\"file.floret\")` is provided.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure you have activated the correct virtual environment if using one, then run `pip install floret`.","cause":"The 'floret' library is not installed in the current Python environment or the environment is not activated.","error":"ModuleNotFoundError: No module named 'floret'"},{"fix":"Consult the `floret` documentation or its GitHub repository for the correct API. Common methods include `get_word_vector`, `get_word_id`, `save_model`, `save_vectors`, and `save_floret_vectors`.","cause":"You are attempting to call a method that does not exist on the `floret` model object, or you have a typo. This might also occur if you are expecting a fastText-specific method that `floret` does not expose or re-implement.","error":"AttributeError: 'Model' object has no attribute 'some_method'"},{"fix":"Upgrade your `floret` installation to the latest version: `pip install --upgrade floret`.","cause":"The `hashCount` (and `mode`) arguments for `floret.train_supervised` were added in version 0.10.4. This error indicates you are using an older version of the `floret` library.","error":"TypeError: train_supervised() got an unexpected keyword argument 'hashCount'"},{"fix":"Write your training data to a text file (e.g., `data.txt`) and pass the path to this file to the training function, for example: `floret.train_unsupervised(\"data.txt\", ...)`.","cause":"Training functions like `train_unsupervised` and `train_supervised` expect a string path to a file containing training data, not raw text or a file-like object.","error":"ValueError: Must pass a file path for training data"}]}