txtai
raw JSON → 9.7.0 verified Tue May 12 auth: no python install: stale quickstart: stale
All-in-one AI framework: embeddings database, semantic search, LLM orchestration, RAG, pipelines and agents. Current version: 9.7.0 (Mar 2026). TWO packages on PyPI: 'txtai' (full local library) and 'txtai.py' (thin API client for remote txtai server). Most tutorials use the full 'txtai' package. Core API: Embeddings class. index() rebuilds entire index. upsert() adds/updates without full rebuild. Content storage must be enabled for SQL queries and content retrieval.
pip install txtai Common errors
error ModuleNotFoundError: No module named 'txtai' ↓
cause The `txtai` library has not been installed in the current Python environment or the environment is not active.
fix
pip install txtai
error ValueError: content must be enabled to save content ↓
cause The `Embeddings` index was initialized without enabling content storage, preventing content retrieval or SQL queries.
fix
Initialize
Embeddings with content=True, e.g., Embeddings(config={'content': True}). error OSError: Can't load tokenizer for 'sentence-transformers/all-MiniLM-L6-v2'. If you were trying to load it from 'https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2', make sure you don't have a local directory with the same name. ↓
cause The specified `sentence-transformers` model cannot be loaded, possibly due to a network issue, a typo in the model name, insufficient disk space, or a corrupted local cache.
fix
Verify the model name and internet connectivity, ensure sufficient disk space, or clear the Hugging Face cache (usually
~/.cache/huggingface/hub) if a corrupted download is suspected. error AttributeError: 'txtai.embeddings.Embeddings' object has no attribute 'add' ↓
cause The `Embeddings` object in `txtai` does not have an `add` method; data is added using `index` or `upsert`.
fix
Use
embeddings.index(data) to rebuild the index or embeddings.upsert(data) to add/update existing data. error TypeError: 'str' object is not iterable ↓
cause The `embeddings.index()` or `embeddings.upsert()` method expects the `data` argument to be a list of items, but a single string (or other non-iterable object) was provided.
fix
Wrap the input data in a list, even if it's a single item, e.g.,
embeddings.index(["text_item"]) or embeddings.upsert([("id1", "text_item", None)]). Warnings
breaking Two packages on PyPI: 'txtai' (full library) and 'txtai.py' (thin API client). They have different APIs. 'pip install txtai.py' installs a client that connects to a remote txtai server — not the local library. ↓
fix For local use: pip install txtai. For connecting to a remote txtai API server: pip install txtai.py
breaking index() replaces the entire index. Calling it twice means the first index is gone. LLMs commonly generate code that calls index() multiple times to 'add' documents. ↓
fix Use upsert() to add/update documents without full rebuild. Use index() only for initial load or full re-index.
breaking search() returns (id, score) tuples by default — not document text. Accessing result['text'] raises KeyError without content=True enabled. ↓
fix Enable Embeddings(content=True) to store and retrieve text from search results.
gotcha SQL queries and content retrieval require content=True at index creation time. Cannot be enabled after index is built without re-indexing. ↓
fix Always set content=True if you need SQL queries, text retrieval, or metadata filtering.
gotcha Base 'pip install txtai' has minimal deps. Most useful features (pipelines, LLM, API server) require extras: txtai[pipeline-text], txtai[api], txtai[all]. ↓
fix For RAG/LLM workflows: pip install txtai[all]. For just semantic search: pip install txtai[similarity].
gotcha Default model downloads from Hugging Face Hub on first use — requires internet access and ~100MB download. Fails in air-gapped environments. ↓
fix Pre-download model: embeddings = Embeddings(path='/local/model/path'). Or set HF_HUB_OFFLINE=1 with a cached model.
gotcha Agents (added in v8) are built on smolagents framework — requires pip install txtai[agent]. Earlier versions used transformers agents which had different API. ↓
fix pip install txtai[agent] for agent support.
breaking Installation of txtai (and its dependencies that require compilation, such as scikit-learn, hnswlib, annoy, fasttext) may fail in minimal environments (e.g., Alpine Linux) due to missing build tools. These packages often require a C/C++ compiler to build native extensions. ↓
fix Install necessary build tools in the environment before attempting to install txtai. For Alpine Linux, this typically involves `apk add build-base python3-dev`.
Install
pip install txtai[all] pip install txtai[api] pip install txtai.py Install compatibility stale last tested: 2026-05-12
python os / libc variant status wheel install import disk
3.10 alpine (musl) txtai - - - -
3.10 alpine (musl) txtai.py - - - -
3.10 alpine (musl) all - - - -
3.10 alpine (musl) api - - - -
3.10 slim (glibc) txtai - - 20.05s 4.8G
3.10 slim (glibc) txtai.py - - - -
3.10 slim (glibc) all - - - -
3.10 slim (glibc) api - - - -
3.11 alpine (musl) txtai - - - -
3.11 alpine (musl) txtai.py - - - -
3.11 alpine (musl) all - - - -
3.11 alpine (musl) api - - - -
3.11 slim (glibc) txtai - - 22.76s 4.9G
3.11 slim (glibc) txtai.py - - - -
3.11 slim (glibc) all - - - -
3.11 slim (glibc) api - - 22.84s 5.0G
3.12 alpine (musl) txtai - - - -
3.12 alpine (musl) txtai.py - - - -
3.12 alpine (musl) all - - - -
3.12 alpine (musl) api - - - -
3.12 slim (glibc) txtai - - 24.01s 4.9G
3.12 slim (glibc) txtai.py - - - -
3.12 slim (glibc) all - - - -
3.12 slim (glibc) api - - 25.87s 5.0G
3.13 alpine (musl) txtai - - - -
3.13 alpine (musl) txtai.py - - - -
3.13 alpine (musl) all - - - -
3.13 alpine (musl) api - - - -
3.13 slim (glibc) txtai - - 21.43s 4.9G
3.13 slim (glibc) txtai.py - - - -
3.13 slim (glibc) all - - - -
3.13 slim (glibc) api - - 23.23s 5.0G
3.9 alpine (musl) txtai - - - -
3.9 alpine (musl) txtai.py - - - -
3.9 alpine (musl) all - - - -
3.9 alpine (musl) api - - - -
3.9 slim (glibc) txtai - - - -
3.9 slim (glibc) txtai.py - - - -
3.9 slim (glibc) all - - - -
3.9 slim (glibc) api - - - -
Imports
- Embeddings (basic semantic search) wrong
from txtai import Embeddings embeddings = Embeddings() embeddings.index(['doc 1', 'doc 2']) # Wrong: index() again REPLACES the entire index embeddings.index(['doc 3']) # doc 1 and doc 2 are now gone # Wrong: expecting text back from search without content storage text = embeddings.search('query', 1)[0][0] # returns id (int), not textcorrectfrom txtai import Embeddings # Default model (all-MiniLM-L6-v2) embeddings = Embeddings() # Or specify model explicitly embeddings = Embeddings(path='sentence-transformers/all-MiniLM-L6-v2') # index() — builds NEW index, overwrites existing embeddings.index(['Correct answer', 'Wrong answer', 'Maybe']) # search returns list of (id, score) tuples results = embeddings.search('positive', 1) print(results) # [(0, 0.298)] — id=0 is 'Correct answer' - Embeddings with content storage wrong
# Without content=True, search only returns (id, score) embeddings = Embeddings() embeddings.index(['text 1', 'text 2']) results = embeddings.search('query', 1) print(results[0]['text']) # KeyError — no text in resultcorrectfrom txtai import Embeddings # Enable content storage to retrieve text from search results embeddings = Embeddings(content=True) # Index with dict documents embeddings.index([ {'id': 0, 'text': 'Python is a programming language'}, {'id': 1, 'text': 'JavaScript runs in browsers'}, {'id': 2, 'text': 'Rust is fast and safe'}, ]) # Now search returns dicts with text results = embeddings.search('compiled language', 1) print(results[0]['text']) # 'Rust is fast and safe' # Can also use SQL results = embeddings.search( "SELECT text, score FROM txtai WHERE similar('web language') LIMIT 1" )
Quickstart stale last tested: 2026-04-23
# pip install txtai
from txtai import Embeddings
# Create embeddings with content storage
embeddings = Embeddings(
path='sentence-transformers/all-MiniLM-L6-v2',
content=True
)
# Index documents
embeddings.index([
{'id': 0, 'text': 'Python is a programming language created by Guido'},
{'id': 1, 'text': 'JavaScript is used for web development'},
{'id': 2, 'text': 'Rust provides memory safety without garbage collection'},
{'id': 3, 'text': 'Go is designed for cloud infrastructure'},
])
# Semantic search — returns dicts with text
results = embeddings.search('systems programming language', 2)
for r in results:
print(r['text'], r['score'])
# Upsert — add without rebuilding
embeddings.upsert([{'id': 4, 'text': 'TypeScript adds types to JavaScript'}])
# Save and load
embeddings.save('/tmp/myindex')
embeddings.load('/tmp/myindex')