LlamaParse
GenAI-native cloud document parser by LlamaIndex for RAG-optimized output. Parses PDFs, PPTX, DOCX, XLSX, HTML and more into markdown, text, or structured JSON with accurate table extraction and multimodal support. Cloud API service — requires an API key from cloud.llamaindex.ai. NOT a local/offline tool. CRITICAL: The llama-parse package (and its successor llama-cloud-services) are DEPRECATED as of early 2026. The replacement is 'llama-cloud' (pip install llama-cloud), which targets LlamaParse API v2. The old packages are maintained until May 1, 2026 only.
Warnings
- breaking llama-parse and llama-cloud-services are DEPRECATED. Both packages will receive no new features and are maintained only until May 1, 2026. New LlamaParse API v2 features are only available in the 'llama-cloud' package.
- breaking LlamaParse API v2 changed target_pages from 0-based indexing to 1-based indexing. Code using target_pages='0,1,2' (v1) must be updated to target_pages='1,2,3' (v2). Silent wrong results if not updated.
- breaking v2 API removed save_images and take_screenshot boolean flags. Replaced by images_to_save parameter. In v1, save_images defaulted to True; in v2, images are NOT saved by default.
- breaking parsing_instruction parameter (v1) is deprecated. Replaced by system_prompt + user_prompt combination in v2. Old parsing_instruction values are silently ignored in v2.
- gotcha LlamaParse is a cloud API — it is not a local parser. All documents are sent to LlamaIndex servers. Not suitable for sensitive/private documents without a VPC or on-prem enterprise agreement.
- gotcha nest_asyncio.apply() is required in Jupyter notebooks and environments that already have a running event loop (e.g., FastAPI startup). Without it, calling sync methods like load_data() raises 'This event loop is already running'.
- gotcha llama-parser (note: singular, no 'e') is a completely different, unmaintained package on PyPI released in 2024. pip install llama-parser installs the wrong package. The correct package names are llama-parse (deprecated) or llama-cloud (current).
- gotcha Parsing tiers in v2 are: fast, cost_effective, agentic, agentic_plus. Using any other string (e.g. old v1 mode names) returns: 'Unsupported tier: must be one of: fast, cost_effective, agentic, agentic_plus'.
Install
-
pip install llama-cloud -
pip install llama-parse -
pip install llama-cloud-services
Imports
- LlamaParse (new — llama-cloud)
from llama_cloud.services.parse import LlamaParse
- LlamaParse (old — llama-parse / llama-cloud-services)
from llama_parse import LlamaParse
Quickstart
# NEW API (llama-cloud, v2) — recommended
# pip install llama-cloud
import os
from llama_cloud.services.parse import LlamaParse
parser = LlamaParse(
api_key=os.environ["LLAMA_CLOUD_API_KEY"],
tier="cost_effective", # fast | cost_effective | agentic | agentic_plus
result_type="markdown",
)
documents = parser.load_data("./my_file.pdf")
print(documents[0].text[:500])
# ---
# OLD API (llama-parse, v1) — deprecated, works until May 2026
# pip install llama-parse
import nest_asyncio
nest_asyncio.apply() # required in notebooks
from llama_parse import LlamaParse
parser = LlamaParse(
api_key=os.environ["LLAMA_CLOUD_API_KEY"],
result_type="markdown",
num_workers=4,
verbose=True,
)
documents = parser.load_data("./my_file.pdf")
documents_batch = parser.load_data(["./file1.pdf", "./file2.pdf"])
documents_async = await parser.aload_data("./my_file.pdf")