Tantivy Python Bindings

0.25.1 · active · verified Fri Apr 10

Tantivy-py provides official Python bindings for Tantivy, a high-performance full-text search engine library written in Rust and inspired by Apache Lucene. It offers fast indexing and search capabilities. The current version is 0.25.1, and the project maintains an active development cycle with relatively frequent releases of minor versions, often a few months apart.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define a schema, create an in-memory Tantivy index, add documents to it, and perform a basic search. It also shows how to retrieve the full document content from search hits.

import tantivy
import os

# 1. Declare the schema
schema_builder = tantivy.SchemaBuilder()
schema_builder.add_text_field("title", stored=True, tokenizer_name="default")
schema_builder.add_text_field("body", stored=True, tokenizer_name="default")
schema_builder.add_integer_field("doc_id", stored=True, indexed=True)
schema = schema_builder.build()

# 2. Create an in-memory index (for persistent, specify a path)
# To use a persistent index, use: index = tantivy.Index(schema, path="/tmp/my_index")
index = tantivy.Index(schema)

# 3. Get an index writer and add documents
writer = index.writer(50_000_000) # 50MB memory arena
writer.add_document(tantivy.Document(title=["The Old Man and the Sea"], body=["He was an old man who fished alone in a skiff."], doc_id=[1]))
writer.add_document(tantivy.Document(title=["The Great Gatsby"], body=["In my younger and more vulnerable years my father gave me some advice."], doc_id=[2]))
writer.commit()

# 4. Get a reader and searcher
index.reload()
reader = index.reader()
searcher = reader.searcher()

# 5. Build and execute a query
query_parser = tantivy.QueryParser(schema, default_fields=["title", "body"])
query = query_parser.parse_query("old man")
hits = searcher.search(query, 10)

# 6. Retrieve documents
print("Search results:")
for score, doc_address in hits:
    retrieved_doc = searcher.doc(doc_address)
    print(f"  Score: {score:.2f}, Doc ID: {retrieved_doc['doc_id'][0]}, Title: {retrieved_doc['title'][0]}")

# Example of retrieving a non-existent field (will be empty list)
missing_field = retrieved_doc.get('non_existent_field')
print(f"  Non-existent field for last doc: {missing_field}") # Expected: []

view raw JSON →