Embedchain
Embedchain is an open-source Retrieval Augmented Generation (RAG) framework designed to simplify the creation and deployment of personalized AI applications. It handles the complex process of loading, chunking, embedding, and storing various types of unstructured data into a vector database for efficient retrieval. The library is actively maintained, currently at version 0.1.128, with frequent updates indicative of its pre-1.0 release phase.
Warnings
- gotcha ChromaDB, the default vector database, can sometimes get corrupted. This often manifests as unexpected errors during operations.
- gotcha The first query to an Embedchain application can be significantly slower than subsequent ones. This is due to initial loading of models and data into memory.
- gotcha Large document sets can consume a substantial amount of RAM. For example, processing 4GB of documents might require a minimum of 8GB of RAM.
- gotcha Embedchain does not provide a straightforward method to update individual documents. If source data changes, the existing indexed document remains.
- gotcha When using OpenAI's API, you might encounter rate limiting issues during intensive embedding or querying operations.
- breaking Embedchain can introduce unexpected transitive dependencies when integrated into projects already using other GenAI frameworks (e.g., `llama-index`, `crewai`), leading to version conflicts and compatibility issues.
- breaking Incorrect configuration of embedding dimensions or incompatible settings between custom embedding models and vector stores (e.g., Qdrant, Ollama) can lead to silent failures or incorrect retrieval, often without clear error messages.
Install
-
pip install embedchain
Imports
- App
from embedchain import App
- OpenSourceApp
from embedchain import OpenSourceApp
Quickstart
import os
from embedchain import App
# Ensure your OpenAI API key is set as an environment variable
# Or replace os.environ.get with your actual key for testing.
# For production, always use environment variables.
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "sk-YOUR_OPENAI_KEY")
# Create an Embedchain app instance
app = App()
# Add data sources (e.g., URLs, PDFs, YouTube videos, local files)
app.add("https://en.wikipedia.org/wiki/Elon_Musk")
app.add("https://www.forbes.com/profile/elon-musk")
# Query the app
response = app.query("How many companies does Elon Musk run and name those?")
print(response)
# You can also use the chat interface for conversational queries
# app.chat("Tell me more about Tesla.")