Marqo
Marqo is an open-source, RAG-ready vector database and tensor search engine built on OpenSearch. It enables multimodal search across text, images, and other data types using embeddings, simplifying the development of advanced search and Retrieval-Augmented Generation (RAG) applications. Currently at version 3.18.0, Marqo maintains an active development cycle with frequent updates and releases.
Common errors
-
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8882): Max retries exceeded with url: /indexes (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at ...>: Failed to establish a new connection: [Errno 111] Connection refused'))cause The Marqo server (often a Docker container) is not running or is not accessible at the specified URL and port.fixEnsure your Marqo Docker container is running (e.g., `docker ps` to check). Verify the `url` in `marqo.Client()` is correct, usually `http://localhost:8882` for local setups. -
marqo.errors.MarqoApiError: Could not find index 'my-non-existent-index'
cause Attempted to access, add documents to, or search an index that has not been created or whose name is misspelled.fixCheck the index name for typos. If the index should exist, ensure `mq.create_index(index_name='your-index-name')` was called successfully before attempting other operations. -
marqo.errors.MarqoApiError: Invalid model_properties: key model_properties.model. If model_properties.model is not specified, you must either specify it through the `model` param, or use default settings.
cause This error typically occurs in Marqo v2.0.0+ when using an older index creation schema where `model` was a top-level parameter, instead of being nested under `model_properties`.fixUpdate your `create_index` call to use the `model_properties` dictionary for specifying the model. Example: `mq.create_index(index_name, model_properties={'model': 'hf/e5-base'})`.
Warnings
- breaking Marqo v2.0.0 introduced significant breaking changes to index settings, particularly regarding `model` and `model_properties`. The `model` parameter in `create_index` was moved into `model_properties` and `index_defaults` was removed.
- breaking Marqo v3.0.0 further refined index settings, specifically moving the `device` parameter. The `device` parameter in index settings (e.g., `device='cuda'`) was moved into `model_properties`.
- gotcha Marqo relies on an underlying OpenSearch instance (often deployed via Docker). Resource consumption (memory, CPU) for Marqo itself and the embeddings models can be high, especially with larger models or heavy indexing/search loads.
- gotcha When running Marqo locally with Docker, it's crucial that the Marqo container is running and accessible. Connection errors are often caused by the container not being started or the specified URL being incorrect.
Install
-
pip install marqo
Imports
- Client
from marqo import MarqoClient
from marqo import Client
Quickstart
import marqo
import os
# For local Marqo instances (e.g., via Docker), the URL is often http://localhost:8882
# For cloud instances, set MARQO_URL and optionally MARQO_API_KEY
marqo_url = os.environ.get('MARQO_URL', 'http://localhost:8882')
marqo_api_key = os.environ.get('MARQO_API_KEY', None)
mq = marqo.Client(url=marqo_url, api_key=marqo_api_key)
index_name = "my-first-marqo-index"
# Create an index (if it doesn't exist)
try:
mq.get_index(index_name=index_name)
print(f"Index '{index_name}' already exists.")
except marqo.errors.MarqoApiError as e:
if "index_not_found" in str(e).lower():
print(f"Creating index '{index_name}'...")
mq.create_index(index_name=index_name)
else:
raise e
# Add documents to the index
docs = [
{
"_id": "doc1",
"title": "The Art of Computer Programming",
"description": "A series of comprehensive monographs by Donald Knuth covering many topics in computer science."
},
{
"_id": "doc2",
"title": "Structure and Interpretation of Computer Programs",
"description": "An influential computer science textbook by Abelson and Sussman, known as SICP."
}
]
response_add = mq.add_documents(index_name=index_name, documents=docs)
print("Added documents:", response_add)
# Perform a search
search_query = "computer science textbooks"
response_search = mq.search(index_name=index_name, q=search_query)
print(f"\nSearch results for '{search_query}':")
for hit in response_search['hits']:
print(f" Title: {hit['title']}, Score: {hit['_score']:.2f}")
# Clean up (optional): delete the index
# mq.delete_index(index_name=index_name)