Elasticsearch DSL
Elasticsearch DSL is a Python client that provides a high-level, declarative, and object-oriented way to write and execute queries against Elasticsearch. It allows users to define document mappings as Python classes and build complex search queries and aggregations using Python objects. As of version 8.18.0, the `elasticsearch-dsl` package's functionality has been integrated directly into the `elasticsearch-py` client library under the `elasticsearch.dsl` namespace. While the `elasticsearch-dsl` package still exists for compatibility, active development now continues within the main `elasticsearch-py` project. Releases are generally tied to Elasticsearch major/minor versions or feature additions.
Warnings
- breaking As of v8.18.0, the `elasticsearch-dsl` package has been integrated into `elasticsearch-py` as the `elasticsearch.dsl` namespace. While `pip install elasticsearch-dsl` still works and provides a compatibility layer (re-exporting `elasticsearch.dsl`), the recommended long-term approach for new projects or migrations is to `pip install elasticsearch` and use `from elasticsearch.dsl import ...` directly. The `elasticsearch-dsl` GitHub repository is now largely archived, with development continuing in the `elasticsearch-py` repository.
- breaking Migrating from `elasticsearch-dsl` 7.x to 8.x involves significant breaking changes. Key changes include the removal of `Document.create()` (replaced by `Document.save(op_type='create')`), changes in how `connections` are configured (`connections.create_connection()` is now preferred over `connections.configure()`), and updates to default serializers for `datetime` objects.
- gotcha Not properly initializing the `connections` object before interacting with Elasticsearch will lead to `ConnectionError` or `ImproperlyConfigured` exceptions. The `connections` object must be configured with at least host information.
- gotcha When defining document fields, using `Keyword()` without `Text()` for a field you intend to analyze (e.g., for full-text search) will prevent that field from being tokenized. Similarly, defining a `Text()` field without an explicit `Keyword()` sub-field makes it harder to perform exact match queries or aggregations on the raw string.
Install
-
pip install elasticsearch-dsl -
pip install elasticsearch
Imports
- Search
from elasticsearch_dsl import Search
- Document
from elasticsearch_dsl import Document
- connections
from elasticsearch_dsl import connections
- Text, Keyword, Integer, Date
from elasticsearch_dsl import Text, Keyword, Integer, Date
Quickstart
import os
from elasticsearch_dsl import Document, Text, Keyword, connections, Search
# Configure connection to Elasticsearch
# Replace with your Elasticsearch host and credentials if necessary
ES_HOST = os.environ.get('ES_HOST', 'http://localhost:9200')
# For cloud deployments or API key auth, uncomment and set these:
# ES_CLOUD_ID = os.environ.get('ES_CLOUD_ID')
# ES_API_KEY = os.environ.get('ES_API_KEY')
# ES_USERNAME = os.environ.get('ES_USERNAME', 'elastic')
# ES_PASSWORD = os.environ.get('ES_PASSWORD', 'changeme')
connections.create_connection(hosts=[ES_HOST]) # Example for local or basic auth
# For cloud/API key: connections.create_connection(cloud_id=ES_CLOUD_ID, api_key=ES_API_KEY)
# Define a Document (schema for your data)
class Article(Document):
title = Text(fields={'keyword': Keyword()})
author = Text(fields={'keyword': Keyword()})
published_date = Keyword()
word_count = Keyword()
class Index:
name = 'my-articles'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0
}
# Create the index (if it doesn't exist) based on the Document definition
Article.init()
# Index some data
article1 = Article(meta={'id': '1'}, title='Python for AI', author='John Doe', published_date='2023-01-01', word_count='2000')
article1.save()
article2 = Article(meta={'id': '2'}, title='Elasticsearch DSL Basics', author='Jane Smith', published_date='2023-03-15', word_count='1500')
article2.save()
# Refresh the index to make documents searchable immediately
connections.get_connection().indices.refresh(index='my-articles')
# Perform a search
s = Search(index='my-articles').query("match", title="python")
response = s.execute()
print(f"Found {response.hits.total.value} results for 'python':")
for hit in response:
print(f"ID: {hit.meta.id}, Title: {hit.title}, Author: {hit.author}")
# Example of a more complex search with filtering
s = Search(index='my-articles') \
.query("match_all") \
.filter("range", word_count={"gte": 1500}) \
.exclude("match", author="john")
response = s.execute()
print(f"\nFound {response.hits.total.value} results for complex query:")
for hit in response:
print(f"ID: {hit.meta.id}, Title: {hit.title}, Author: {hit.author}")
# Cleanup (optional)
# connections.get_connection().indices.delete(index='my-articles', ignore=[400, 404])