LlamaParse Reader for LlamaIndex
The `llama-index-readers-llama-parse` library provides a LlamaIndex reader that integrates with LlamaParse. It enables parsing of various complex file types (like PDFs, PPTs, etc.) into structured markdown, which can then be easily ingested and processed by LlamaIndex for RAG and other LLM applications. The current version is 0.6.1, and it's part of the broader LlamaIndex ecosystem, implying a regular release cadence with LlamaIndex.
Warnings
- gotcha A `LLAMAPARSE_API_KEY` is mandatory for using LlamaParseReader. This key must be obtained from LlamaIndex and provided either directly during initialization or via the environment variable `LLAMAPARSE_API_KEY`.
- gotcha LlamaParse is a commercial service. While a free tier may be available, extensive usage or specific features might incur costs or be subject to rate limits. Be aware of your LlamaParse plan and associated usage policies.
- gotcha The `llama-parse` package is a separate dependency and must be explicitly installed alongside `llama-index-readers-llama-parse`. Failing to install `llama-parse` will result in runtime errors.
- gotcha Parsing large or complex documents with LlamaParse can be time-consuming. The `load_data()` method by default polls the LlamaParse API until the job is complete, which can lead to long execution times for synchronous calls. Consider using `load_data_async()` for non-blocking operations in production environments.
Install
-
pip install llama-index-readers-llama-parse llama-parse
Imports
- LlamaParseReader
from llama_index.readers.llama_parse import LlamaParseReader
Quickstart
import os
from llama_index.readers.llama_parse import LlamaParseReader
# Ensure you have your LlamaParse API key set as an environment variable
# os.environ["LLAMAPARSE_API_KEY"] = "your-api-key"
api_key = os.environ.get('LLAMAPARSE_API_KEY', '')
if not api_key:
raise ValueError("LLAMAPARSE_API_KEY environment variable not set.")
# Initialize the LlamaParse reader
# For advanced options, see LlamaParseReader documentation (e.g., result_type='markdown')
parser = LlamaParseReader(api_key=api_key, verbose=True)
# Load data from a file (replace 'path/to/your/document.pdf' with an actual file)
# LlamaParse supports various file types like PDF, PPTX, DOCX, TXT, CSV, JSON, XML
# Note: This is an asynchronous operation and may take time to complete.
# The load_data method will poll LlamaParse until the parsing is complete.
try:
documents = parser.load_data("path/to/your/document.pdf")
print(f"Successfully parsed {len(documents)} document(s).")
for doc in documents:
print(f"Document ID: {doc.id_}")
print(f"First 200 chars: {doc.text[:200]}...")
except Exception as e:
print(f"Error parsing document: {e}")
print("Make sure 'path/to/your/document.pdf' exists and your API key is valid.")