{"id":6631,"library":"flagembedding","title":"FlagEmbedding","description":"FlagEmbedding is an actively developed Python library providing a powerful toolkit for text embedding and reranking, primarily featuring BGE (BAAI General Embedding) models. It specializes in multi-linguality, multi-granularities (up to 8192 tokens), and multi-functionality (dense, lexical, multi-vec retrieval). It achieves state-of-the-art performance in various benchmarks and is designed for applications like retrieval, classification, clustering, semantic search, and enhancing vector databases for Large Language Models (LLMs). The library is currently at version 1.3.5 and sees regular updates.","status":"active","version":"1.3.5","language":"en","source_language":"en","source_url":"https://github.com/FlagOpen/FlagEmbedding","tags":["embedding","reranking","NLP","LLM","semantic search","information retrieval","transformers","machine learning"],"install":[{"cmd":"pip install -U FlagEmbedding","lang":"bash","label":"Base Install"},{"cmd":"pip install -U FlagEmbedding[finetune]","lang":"bash","label":"With Finetuning Dependencies"}],"dependencies":[{"reason":"Core deep learning framework","package":"torch","optional":false},{"reason":"Handles model loading and tokenization","package":"transformers","optional":false},{"reason":"Used in data handling, especially for fine-tuning","package":"datasets","optional":false},{"reason":"Distributed training and inference utility","package":"accelerate","optional":false},{"reason":"Integration for Sentence-BERT models","package":"sentence_transformers","optional":false},{"reason":"Parameter-Efficient Fine-Tuning","package":"peft","optional":false},{"reason":"Information retrieval datasets","package":"ir-datasets","optional":false},{"reason":"Tokenizer dependency","package":"sentencepiece","optional":false},{"reason":"Data serialization format","package":"protobuf","optional":false},{"reason":"Benchmarking tools","package":"air-benchmark","optional":false},{"reason":"Used for efficient distributed fine-tuning","package":"deepspeed","optional":true},{"reason":"Optimized attention mechanism for fine-tuning","package":"flash-attn","optional":true}],"imports":[{"note":"Recommended high-level class for unified embedding model inference.","symbol":"FlagAutoModel","correct":"from FlagEmbedding import FlagAutoModel"},{"note":"For loading and using reranker models.","symbol":"FlagReranker","correct":"from FlagEmbedding import FlagReranker"},{"note":"While functional, `FlagAutoModel` is often preferred for its unified interface. Older examples or specific use cases might directly use `FlagModel`.","wrong":"from FlagEmbedding.FlagModel import FlagModel","symbol":"FlagModel","correct":"from FlagEmbedding import FlagModel"}],"quickstart":{"code":"import os\nfrom FlagEmbedding import FlagAutoModel\n\n# You can replace 'BAAI/bge-base-en-v1.5' with other BGE models like 'BAAI/bge-m3'\n# Consider setting query_instruction_for_retrieval for optimal performance in retrieval tasks.\n# Use use_fp16=True for faster inference on compatible hardware.\n\n# Example for embedding queries and passages\nmodel = FlagAutoModel.from_pretrained(\n    'BAAI/bge-base-en-v1.5',\n    query_instruction_for_retrieval=\"Represent this sentence for searching relevant passages:\",\n    use_fp16=True\n)\n\nqueries = [\"What is FlagEmbedding?\", \"How to use embedding models?\"]\npassages = [\n    \"FlagEmbedding maps text to low-dimensional dense vectors for tasks like retrieval.\",\n    \"Embedding models can be used to generate vector representations of text.\",\n    \"The BGE models are state-of-the-art embedding models.\"\n]\n\n# Encode queries and passages\nquery_embeddings = model.encode_queries(queries)\npassage_embeddings = model.encode_corpus(passages)\n\nprint(f\"Query embeddings shape: {query_embeddings.shape}\")\nprint(f\"Passage embeddings shape: {passage_embeddings.shape}\")\n\n# Compute similarity scores (e.g., dot product)\nscores = query_embeddings @ passage_embeddings.T\nprint(\"Similarity scores:\")\nprint(scores)\n\n# Example for reranking using FlagReranker\nfrom FlagEmbedding import FlagReranker\n\n# Replace with a reranker model like 'BAAI/bge-reranker-base'\nreranker = FlagReranker('BAAI/bge-reranker-base', use_fp16=True)\n\nquery_passage_pairs = [\n    ['What is AI?', 'Artificial intelligence (AI) is intelligence demonstrated by machines.'],\n    ['What is AI?', 'The quick brown fox jumps over the lazy dog.']\n]\n\nranks = reranker.compute_score(query_passage_pairs)\nprint(\"Reranker scores:\")\nprint(ranks)\n","lang":"python","description":"This quickstart demonstrates how to load an embedding model using `FlagAutoModel.from_pretrained()` and generate embeddings for queries and passages. It then shows how to compute similarity scores. Additionally, it provides an example of using `FlagReranker` to compute scores for query-passage pairs. Remember to replace placeholder model names with actual Hugging Face model IDs."},"warnings":[{"fix":"Experiment with `query_instruction_for_retrieval` parameter. For retrieval tasks with short queries, pass a descriptive instruction like `Represent this sentence for searching relevant passages:` to `from_pretrained` or `__init__`. For other tasks, omit or use an empty string.","message":"For BGE v1.5 models, using `query_instruction_for_retrieval` is generally recommended for short queries in retrieval tasks for optimal performance. For other tasks (e.g., semantic similarity of short texts), instructions might not be needed or could even degrade performance. While v1.5 models are improved to work without instructions with only slight degradation, explicit instruction is often best practice for retrieval.","severity":"gotcha","affected_versions":">=1.1.0 (specifically BGE v1.5 models)"},{"fix":"Focus on the ranking of similarity scores rather than absolute values. If filtering is required, empirically determine a suitable threshold for your dataset.","message":"Similarity scores from BGE models, especially those prior to v1.5, are often concentrated in a narrow range (e.g., [0.6, 1]). An absolute score greater than 0.5 does not necessarily indicate strong similarity. For downstream tasks like retrieval, the *relative order* of scores is usually more important than their absolute values. If filtering by threshold, determine an appropriate threshold (e.g., 0.8-0.9) based on your specific data's similarity distribution.","severity":"gotcha","affected_versions":"<1.5 (less severe for >=1.5 due to alleviation)"},{"fix":"Investigate GitHub issues for potential workarounds or official fixes. Consider pinning to a 1.2.x version if inference speed is critical and you're encountering this issue. The issue points to `self.model.to(device)` and `self.model.eval()` being invoked multiple times in `encode_single_device`.","message":"Users upgrading from FlagEmbedding v1.2.x to v1.3.x have reported significant inference performance regressions (up to 100% slower) for BGE-M3 models and FlagReranker. This degradation can occur in subsequent calls to `model.encode` or `compute_score`.","severity":"breaking","affected_versions":"1.3.x"},{"fix":"Use a dedicated virtual environment. If conflicts arise, try explicitly installing a compatible version of `transformers` or `accelerate` that satisfies all your dependencies, or install `FlagEmbedding` in isolation.","message":"Dependency conflicts, particularly with `transformers` and `accelerate`, have been reported. For instance, `transformers==4.44.2` has caused conflicts when other packages require a newer version (e.g., `transformers<5.0.0,>=4.45.2`). This can lead to installation failures or runtime issues.","severity":"gotcha","affected_versions":"All versions, depending on other installed packages"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}