Graph Retriever
Graph Retriever is a Python library that combines unstructured similarity search with structured document traversal to enhance Retrieval-Augmented Generation (RAG) applications. It enables traversing relationships between documents to find more relevant context than simple similarity search alone. The current version is 0.8.0, with minor releases occurring frequently, often driven by integration updates with DataStax Astra DB and other components.
Common errors
-
ModuleNotFoundError: No module named 'astrapy'
cause The `AstraGraphStore` class, used for integrating with DataStax Astra DB, requires the `astrapy` library, which is not a direct dependency of `graph-retriever`.fixInstall the `astrapy` library: `pip install astrapy`. -
TypeError: 'Id' object is not callable
cause Attempting to use `Id()` objects directly as string IDs or in contexts where a string is expected, particularly for edge identifiers, after `v0.6.0`.fixInstead of `graph_store.add_edge(Id('doc_a'), Id('doc_b'))`, use `graph_store.add_edge('doc_a', 'doc_b')`. Ensure you are passing string IDs. -
TypeError: BFSTraversalStrategy.__init__() got an unexpected keyword argument 'k'
cause The `k` parameter was temporarily removed in `v0.5.0` for retriever strategies.fixUpgrade to `graph-retriever` `v0.5.1` or newer where the `k` parameter was restored. If you must use `v0.5.0`, remove the `k` parameter from the strategy constructor. -
astrapy.exceptions.AstraDBConnectionError: Failed to connect to Astra DB
cause The provided Astra DB application token or API endpoint is incorrect, expired, or improperly formatted.fixDouble-check your `ASTRA_DB_APPLICATION_TOKEN` and `ASTRA_DB_API_ENDPOINT`. Ensure they are correctly set as environment variables or passed explicitly when initializing `AstraDB`.
Warnings
- breaking The representation of document IDs on edges changed in `v0.6.0`. Previously, `Id()` objects were used; now, string IDs (e.g., `'$id'`) are preferred or required in many contexts.
- breaking The internal design of retriever strategies was significantly refactored in `v0.5.0`, leading to potential changes in constructor parameters for various `RetrieverStrategy` implementations.
- gotcha The `k` parameter, used to specify the number of nodes to retrieve in strategies, was temporarily removed in `v0.5.0` and then restored in `v0.5.1`. This can cause `TypeError` for users on `v0.5.0` and then suddenly work again on `v0.5.1+`.
- gotcha While `graph-retriever` itself has minimal direct dependencies, using the `AstraGraphStore` (a common use case) explicitly requires the `astrapy` library, which is not automatically installed with `graph-retriever`.
Install
-
pip install graph-retriever
Imports
- GraphRetriever
from graph_retriever.graph_retriever import GraphRetriever
- Document
from graph_retriever.document import Document
- Id
from graph_retriever.id_object import Id
from graph_retriever.id import Id
- BFSTraversalStrategy
from graph_retriever.retriever_strategies.graph_traversal import BFSTraversalStrategy
- AstraGraphStore
from graph_retriever.graph_store.astra_graph_store import AstraGraphStore
Quickstart
import os
from astrapy.db import AstraDB
from graph_retriever.graph_retriever import GraphRetriever
from graph_retriever.retriever_strategies.graph_traversal import BFSTraversalStrategy
from graph_retriever.document import Document
from graph_retriever.graph_store.astra_graph_store import AstraGraphStore
# Initialize AstraDB connection
token = os.environ.get("ASTRA_DB_APPLICATION_TOKEN", "YOUR_ASTRA_DB_APPLICATION_TOKEN")
api_endpoint = os.environ.get("ASTRA_DB_API_ENDPOINT", "YOUR_ASTRA_DB_API_ENDPOINT")
if not token or not api_endpoint:
raise ValueError("Please set ASTRA_DB_APPLICATION_TOKEN and ASTRA_DB_API_ENDPOINT environment variables.")
astra_db = AstraDB(token=token, api_endpoint=api_endpoint)
# Initialize GraphStore (using a test collection)
graph_store = AstraGraphStore(astra_db=astra_db, collection_name="my_rag_collection")
# Example documents and edges
docs = [
Document(id="doc1", content="Python is a versatile programming language.", metadata={"topic": "programming"}),
Document(id="doc2", content="Generative AI models are changing software development.", metadata={"topic": "AI"}),
Document(id="doc3", content="Large Language Models (LLMs) are a type of Generative AI.", metadata={"topic": "AI"})
]
graph_store.add_documents(docs)
graph_store.add_edge("doc1", "doc2", label="discusses_impact_on")
graph_store.add_edge("doc2", "doc3", label="explains")
# Initialize Retriever Strategy
retriever_strategy = BFSTraversalStrategy(k=2, max_depth=1) # Retrieve 2 nodes, 1 depth
# Initialize GraphRetriever
# embedding_dimension is required for vector search, ensure it matches your embedding model
retriever = GraphRetriever(
graph_store=graph_store,
retriever_strategy=retriever_strategy,
embedding_dimension=1536 # Example: for OpenAI embeddings
)
# Example query
query_doc = Document(id="query_id", content="What are LLMs?")
retrieved_nodes = retriever.get_relevant_documents(query_doc)
print("\nRetrieved Nodes:")
for node in retrieved_nodes:
print(f" ID: {node.id}, Content: {node.content[:50]}...")
# Optional: Clean up the collection (uncomment to run)
# graph_store.clear_collection()