txtai
All-in-one AI framework: embeddings database, semantic search, LLM orchestration, RAG, pipelines and agents. Current version: 9.7.0 (Mar 2026). TWO packages on PyPI: 'txtai' (full local library) and 'txtai.py' (thin API client for remote txtai server). Most tutorials use the full 'txtai' package. Core API: Embeddings class. index() rebuilds entire index. upsert() adds/updates without full rebuild. Content storage must be enabled for SQL queries and content retrieval.
Warnings
- breaking Two packages on PyPI: 'txtai' (full library) and 'txtai.py' (thin API client). They have different APIs. 'pip install txtai.py' installs a client that connects to a remote txtai server — not the local library.
- breaking index() replaces the entire index. Calling it twice means the first index is gone. LLMs commonly generate code that calls index() multiple times to 'add' documents.
- breaking search() returns (id, score) tuples by default — not document text. Accessing result['text'] raises KeyError without content=True enabled.
- gotcha SQL queries and content retrieval require content=True at index creation time. Cannot be enabled after index is built without re-indexing.
- gotcha Base 'pip install txtai' has minimal deps. Most useful features (pipelines, LLM, API server) require extras: txtai[pipeline-text], txtai[api], txtai[all].
- gotcha Default model downloads from Hugging Face Hub on first use — requires internet access and ~100MB download. Fails in air-gapped environments.
- gotcha Agents (added in v8) are built on smolagents framework — requires pip install txtai[agent]. Earlier versions used transformers agents which had different API.
Install
-
pip install txtai -
pip install txtai[all] -
pip install txtai[api] -
pip install txtai.py
Imports
- Embeddings (basic semantic search)
from txtai import Embeddings # Default model (all-MiniLM-L6-v2) embeddings = Embeddings() # Or specify model explicitly embeddings = Embeddings(path='sentence-transformers/all-MiniLM-L6-v2') # index() — builds NEW index, overwrites existing embeddings.index(['Correct answer', 'Wrong answer', 'Maybe']) # search returns list of (id, score) tuples results = embeddings.search('positive', 1) print(results) # [(0, 0.298)] — id=0 is 'Correct answer' - Embeddings with content storage
from txtai import Embeddings # Enable content storage to retrieve text from search results embeddings = Embeddings(content=True) # Index with dict documents embeddings.index([ {'id': 0, 'text': 'Python is a programming language'}, {'id': 1, 'text': 'JavaScript runs in browsers'}, {'id': 2, 'text': 'Rust is fast and safe'}, ]) # Now search returns dicts with text results = embeddings.search('compiled language', 1) print(results[0]['text']) # 'Rust is fast and safe' # Can also use SQL results = embeddings.search( "SELECT text, score FROM txtai WHERE similar('web language') LIMIT 1" )
Quickstart
# pip install txtai
from txtai import Embeddings
# Create embeddings with content storage
embeddings = Embeddings(
path='sentence-transformers/all-MiniLM-L6-v2',
content=True
)
# Index documents
embeddings.index([
{'id': 0, 'text': 'Python is a programming language created by Guido'},
{'id': 1, 'text': 'JavaScript is used for web development'},
{'id': 2, 'text': 'Rust provides memory safety without garbage collection'},
{'id': 3, 'text': 'Go is designed for cloud infrastructure'},
])
# Semantic search — returns dicts with text
results = embeddings.search('systems programming language', 2)
for r in results:
print(r['text'], r['score'])
# Upsert — add without rebuilding
embeddings.upsert([{'id': 4, 'text': 'TypeScript adds types to JavaScript'}])
# Save and load
embeddings.save('/tmp/myindex')
embeddings.load('/tmp/myindex')