txtai

9.7.0 verified Tue May 12 auth: no python install: stale quickstart: stale

All-in-one AI framework: embeddings database, semantic search, LLM orchestration, RAG, pipelines and agents. Current version: 9.7.0 (Mar 2026). TWO packages on PyPI: 'txtai' (full local library) and 'txtai.py' (thin API client for remote txtai server). Most tutorials use the full 'txtai' package. Core API: Embeddings class. index() rebuilds entire index. upsert() adds/updates without full rebuild. Content storage must be enabled for SQL queries and content retrieval.

pip install txtai

Common errors

error ModuleNotFoundError: No module named 'txtai' ↓

cause The `txtai` library has not been installed in the current Python environment or the environment is not active.

fix

pip install txtai

error ValueError: content must be enabled to save content ↓

cause The `Embeddings` index was initialized without enabling content storage, preventing content retrieval or SQL queries.

fix

Initialize Embeddings with content=True, e.g., Embeddings(config={'content': True}).

error OSError: Can't load tokenizer for 'sentence-transformers/all-MiniLM-L6-v2'. If you were trying to load it from 'https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2', make sure you don't have a local directory with the same name. ↓

cause The specified `sentence-transformers` model cannot be loaded, possibly due to a network issue, a typo in the model name, insufficient disk space, or a corrupted local cache.

fix

Verify the model name and internet connectivity, ensure sufficient disk space, or clear the Hugging Face cache (usually ~/.cache/huggingface/hub) if a corrupted download is suspected.

error AttributeError: 'txtai.embeddings.Embeddings' object has no attribute 'add' ↓

cause The `Embeddings` object in `txtai` does not have an `add` method; data is added using `index` or `upsert`.

fix

Use embeddings.index(data) to rebuild the index or embeddings.upsert(data) to add/update existing data.

error TypeError: 'str' object is not iterable ↓

cause The `embeddings.index()` or `embeddings.upsert()` method expects the `data` argument to be a list of items, but a single string (or other non-iterable object) was provided.

fix

Wrap the input data in a list, even if it's a single item, e.g., embeddings.index(["text_item"]) or embeddings.upsert([("id1", "text_item", None)]).

Warnings

breaking Two packages on PyPI: 'txtai' (full library) and 'txtai.py' (thin API client). They have different APIs. 'pip install txtai.py' installs a client that connects to a remote txtai server — not the local library. ↓

fix For local use: pip install txtai. For connecting to a remote txtai API server: pip install txtai.py

breaking index() replaces the entire index. Calling it twice means the first index is gone. LLMs commonly generate code that calls index() multiple times to 'add' documents. ↓

fix Use upsert() to add/update documents without full rebuild. Use index() only for initial load or full re-index.

breaking search() returns (id, score) tuples by default — not document text. Accessing result['text'] raises KeyError without content=True enabled. ↓

fix Enable Embeddings(content=True) to store and retrieve text from search results.

gotcha SQL queries and content retrieval require content=True at index creation time. Cannot be enabled after index is built without re-indexing. ↓

fix Always set content=True if you need SQL queries, text retrieval, or metadata filtering.

gotcha Base 'pip install txtai' has minimal deps. Most useful features (pipelines, LLM, API server) require extras: txtai[pipeline-text], txtai[api], txtai[all]. ↓

fix For RAG/LLM workflows: pip install txtai[all]. For just semantic search: pip install txtai[similarity].

gotcha Default model downloads from Hugging Face Hub on first use — requires internet access and ~100MB download. Fails in air-gapped environments. ↓

fix Pre-download model: embeddings = Embeddings(path='/local/model/path'). Or set HF_HUB_OFFLINE=1 with a cached model.

gotcha Agents (added in v8) are built on smolagents framework — requires pip install txtai[agent]. Earlier versions used transformers agents which had different API. ↓

fix pip install txtai[agent] for agent support.

breaking Installation of txtai (and its dependencies that require compilation, such as scikit-learn, hnswlib, annoy, fasttext) may fail in minimal environments (e.g., Alpine Linux) due to missing build tools. These packages often require a C/C++ compiler to build native extensions. ↓

fix Install necessary build tools in the environment before attempting to install txtai. For Alpine Linux, this typically involves `apk add build-base python3-dev`.

Install

pip install txtai[all]

pip install txtai[api]

pip install txtai.py

Install compatibility stale last tested: 2026-05-12

python os / libc variant status wheel install import disk

3.10 alpine (musl) txtai - - - -

3.10 alpine (musl) txtai.py - - - -

3.10 alpine (musl) all - - - -

3.10 alpine (musl) api - - - -

3.10 slim (glibc) txtai - - 20.05s 4.8G

3.10 slim (glibc) txtai.py - - - -

3.10 slim (glibc) all - - - -

3.10 slim (glibc) api - - - -

3.11 alpine (musl) txtai - - - -

3.11 alpine (musl) txtai.py - - - -

3.11 alpine (musl) all - - - -

3.11 alpine (musl) api - - - -

3.11 slim (glibc) txtai - - 22.76s 4.9G

3.11 slim (glibc) txtai.py - - - -

3.11 slim (glibc) all - - - -

3.11 slim (glibc) api - - 22.84s 5.0G

3.12 alpine (musl) txtai - - - -

3.12 alpine (musl) txtai.py - - - -

3.12 alpine (musl) all - - - -

3.12 alpine (musl) api - - - -

3.12 slim (glibc) txtai - - 24.01s 4.9G

3.12 slim (glibc) txtai.py - - - -

3.12 slim (glibc) all - - - -

3.12 slim (glibc) api - - 25.87s 5.0G

3.13 alpine (musl) txtai - - - -

3.13 alpine (musl) txtai.py - - - -

3.13 alpine (musl) all - - - -

3.13 alpine (musl) api - - - -

3.13 slim (glibc) txtai - - 21.43s 4.9G

3.13 slim (glibc) txtai.py - - - -

3.13 slim (glibc) all - - - -

3.13 slim (glibc) api - - 23.23s 5.0G

3.9 alpine (musl) txtai - - - -

3.9 alpine (musl) txtai.py - - - -

3.9 alpine (musl) all - - - -

3.9 alpine (musl) api - - - -

3.9 slim (glibc) txtai - - - -

3.9 slim (glibc) txtai.py - - - -

3.9 slim (glibc) all - - - -

3.9 slim (glibc) api - - - -

Imports

Embeddings (basic semantic search)

wrong

from txtai import Embeddings

embeddings = Embeddings()
embeddings.index(['doc 1', 'doc 2'])

# Wrong: index() again REPLACES the entire index
embeddings.index(['doc 3'])  # doc 1 and doc 2 are now gone

# Wrong: expecting text back from search without content storage
text = embeddings.search('query', 1)[0][0]  # returns id (int), not text

correct

from txtai import Embeddings

# Default model (all-MiniLM-L6-v2)
embeddings = Embeddings()

# Or specify model explicitly
embeddings = Embeddings(path='sentence-transformers/all-MiniLM-L6-v2')

# index() — builds NEW index, overwrites existing
embeddings.index(['Correct answer', 'Wrong answer', 'Maybe'])

# search returns list of (id, score) tuples
results = embeddings.search('positive', 1)
print(results)  # [(0, 0.298)] — id=0 is 'Correct answer'

index() rebuilds the entire index from scratch — calling it again wipes previous data. Use upsert() to add/update without full rebuild. search() returns (id, score) tuples by default, not text.

Embeddings with content storage

wrong

# Without content=True, search only returns (id, score)
embeddings = Embeddings()
embeddings.index(['text 1', 'text 2'])
results = embeddings.search('query', 1)
print(results[0]['text'])  # KeyError — no text in result

correct

from txtai import Embeddings

# Enable content storage to retrieve text from search results
embeddings = Embeddings(content=True)

# Index with dict documents
embeddings.index([
    {'id': 0, 'text': 'Python is a programming language'},
    {'id': 1, 'text': 'JavaScript runs in browsers'},
    {'id': 2, 'text': 'Rust is fast and safe'},
])

# Now search returns dicts with text
results = embeddings.search('compiled language', 1)
print(results[0]['text'])  # 'Rust is fast and safe'

# Can also use SQL
results = embeddings.search(
    "SELECT text, score FROM txtai WHERE similar('web language') LIMIT 1"
)

Without content=True, search returns (id, score) tuples only. Enable content=True to store and retrieve document text. SQL queries also require content=True.

Quickstart stale last tested: 2026-04-23

txtai Embeddings with content storage, search, upsert, save/load.

# pip install txtai
from txtai import Embeddings

# Create embeddings with content storage
embeddings = Embeddings(
    path='sentence-transformers/all-MiniLM-L6-v2',
    content=True
)

# Index documents
embeddings.index([
    {'id': 0, 'text': 'Python is a programming language created by Guido'},
    {'id': 1, 'text': 'JavaScript is used for web development'},
    {'id': 2, 'text': 'Rust provides memory safety without garbage collection'},
    {'id': 3, 'text': 'Go is designed for cloud infrastructure'},
])

# Semantic search — returns dicts with text
results = embeddings.search('systems programming language', 2)
for r in results:
    print(r['text'], r['score'])

# Upsert — add without rebuilding
embeddings.upsert([{'id': 4, 'text': 'TypeScript adds types to JavaScript'}])

# Save and load
embeddings.save('/tmp/myindex')
embeddings.load('/tmp/myindex')