Graph Retriever

0.8.0 · active · verified Fri Apr 17

Graph Retriever is a Python library that combines unstructured similarity search with structured document traversal to enhance Retrieval-Augmented Generation (RAG) applications. It enables traversing relationships between documents to find more relevant context than simple similarity search alone. The current version is 0.8.0, with minor releases occurring frequently, often driven by integration updates with DataStax Astra DB and other components.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to set up `GraphRetriever` with `AstraGraphStore`. It involves initializing an `AstraDB` connection, adding documents and edges to the graph store, defining a retrieval strategy (e.g., BFS), and then using the `GraphRetriever` to fetch relevant documents based on a query, considering both content similarity and graph structure. Ensure `ASTRA_DB_APPLICATION_TOKEN` and `ASTRA_DB_API_ENDPOINT` environment variables are set.

import os
from astrapy.db import AstraDB
from graph_retriever.graph_retriever import GraphRetriever
from graph_retriever.retriever_strategies.graph_traversal import BFSTraversalStrategy
from graph_retriever.document import Document
from graph_retriever.graph_store.astra_graph_store import AstraGraphStore

# Initialize AstraDB connection
token = os.environ.get("ASTRA_DB_APPLICATION_TOKEN", "YOUR_ASTRA_DB_APPLICATION_TOKEN")
api_endpoint = os.environ.get("ASTRA_DB_API_ENDPOINT", "YOUR_ASTRA_DB_API_ENDPOINT")

if not token or not api_endpoint:
    raise ValueError("Please set ASTRA_DB_APPLICATION_TOKEN and ASTRA_DB_API_ENDPOINT environment variables.")

astra_db = AstraDB(token=token, api_endpoint=api_endpoint)

# Initialize GraphStore (using a test collection)
graph_store = AstraGraphStore(astra_db=astra_db, collection_name="my_rag_collection")

# Example documents and edges
docs = [
    Document(id="doc1", content="Python is a versatile programming language.", metadata={"topic": "programming"}),
    Document(id="doc2", content="Generative AI models are changing software development.", metadata={"topic": "AI"}),
    Document(id="doc3", content="Large Language Models (LLMs) are a type of Generative AI.", metadata={"topic": "AI"})
]
graph_store.add_documents(docs)
graph_store.add_edge("doc1", "doc2", label="discusses_impact_on")
graph_store.add_edge("doc2", "doc3", label="explains")

# Initialize Retriever Strategy
retriever_strategy = BFSTraversalStrategy(k=2, max_depth=1) # Retrieve 2 nodes, 1 depth

# Initialize GraphRetriever
# embedding_dimension is required for vector search, ensure it matches your embedding model
retriever = GraphRetriever(
    graph_store=graph_store,
    retriever_strategy=retriever_strategy,
    embedding_dimension=1536 # Example: for OpenAI embeddings
)

# Example query
query_doc = Document(id="query_id", content="What are LLMs?")
retrieved_nodes = retriever.get_relevant_documents(query_doc)

print("\nRetrieved Nodes:")
for node in retrieved_nodes:
    print(f"  ID: {node.id}, Content: {node.content[:50]}...")

# Optional: Clean up the collection (uncomment to run)
# graph_store.clear_collection()

view raw JSON →