LlamaParse

raw JSON →
0.6.94 verified Tue May 12 auth: no python install: stale quickstart: stale deprecated

GenAI-native cloud document parser by LlamaIndex for RAG-optimized output. Parses PDFs, PPTX, DOCX, XLSX, HTML and more into markdown, text, or structured JSON with accurate table extraction and multimodal support. Cloud API service — requires an API key from cloud.llamaindex.ai. NOT a local/offline tool. CRITICAL: The llama-parse package (and its successor llama-cloud-services) are DEPRECATED as of early 2026. The replacement is 'llama-cloud' (pip install llama-cloud), which targets LlamaParse API v2. The old packages are maintained until May 1, 2026 only.

pip install llama-cloud
error Invalid API Key
cause The `LLAMA_CLOUD_API_KEY` environment variable is not set, or the provided API key is incorrect, revoked, or used with the wrong regional endpoint.
fix
Set the LLAMA_CLOUD_API_KEY environment variable with a valid key obtained from cloud.llamaindex.ai, or pass the API key directly to the LlamaParse constructor using api_key="llx-...". Ensure the key is active and corresponds to the correct LlamaCloud project and region.
error ModuleNotFoundError: No module named 'llama_parse'
cause The `llama-parse` package is either not installed, installed incorrectly, or there are conflicting package versions. This error can also occur if attempting to use `llama-parse` functionality with the newer `llama-cloud` package but retaining old `llama_parse` imports.
fix
Ensure llama-parse is installed with pip install -U llama-parse. Given its deprecation, it is highly recommended to migrate to pip install llama-cloud and update your imports and code accordingly to target LlamaParse API v2.
error TypeError: 'type' object is not subscriptable
cause This error typically arises when running `llama-parse` (or dependent libraries) with a Python version older than 3.9. Python 3.8 and earlier do not natively support the `list[dict]` type hint syntax used in the library's code.
fix
Upgrade your Python environment to version 3.9 or newer to support the modern type hint syntax used by the library.
error AttributeError: 'LlamaParse' object has no attribute 'infer_schema'
cause The `infer_schema` method is not available on the `LlamaParse` object in the installed version of the library. This method might have been removed, renamed, or never existed in the public API, leading to issues when following outdated examples or trying to use non-existent functionality.
fix
Consult the official llama-parse or llama-cloud documentation for the correct methods to achieve schema extraction or parsing, such as parse_file, parse_obj, schema, or schema_json, as infer_schema is not a recognized method.
breaking llama-parse and llama-cloud-services are DEPRECATED. Both packages will receive no new features and are maintained only until May 1, 2026. New LlamaParse API v2 features are only available in the 'llama-cloud' package.
fix Migrate to: pip install llama-cloud. New import: from llama_cloud.services.parse import LlamaParse. Review the v1→v2 migration guide at developers.llamaindex.ai.
breaking LlamaParse API v2 changed target_pages from 0-based indexing to 1-based indexing. Code using target_pages='0,1,2' (v1) must be updated to target_pages='1,2,3' (v2). Silent wrong results if not updated.
fix Add 1 to all target_pages values when migrating from v1 to v2.
breaking v2 API removed save_images and take_screenshot boolean flags. Replaced by images_to_save parameter. In v1, save_images defaulted to True; in v2, images are NOT saved by default.
fix Explicitly set images_to_save in v2 if you need images extracted. Do not assume v1 image defaults carry over.
breaking parsing_instruction parameter (v1) is deprecated. Replaced by system_prompt + user_prompt combination in v2. Old parsing_instruction values are silently ignored in v2.
fix Rewrite parsing_instruction content as system_prompt and/or user_prompt in v2 configuration.
gotcha LlamaParse is a cloud API — it is not a local parser. All documents are sent to LlamaIndex servers. Not suitable for sensitive/private documents without a VPC or on-prem enterprise agreement.
fix For on-prem or data-sensitive use cases, contact LlamaIndex for enterprise/VPC options. There is no self-hosted OSS equivalent.
gotcha nest_asyncio.apply() is required in Jupyter notebooks and environments that already have a running event loop (e.g., FastAPI startup). Without it, calling sync methods like load_data() raises 'This event loop is already running'.
fix Add import nest_asyncio; nest_asyncio.apply() at the top of any notebook or async-host environment. New llama-cloud SDK has improved sync/async handling.
gotcha llama-parser (note: singular, no 'e') is a completely different, unmaintained package on PyPI released in 2024. pip install llama-parser installs the wrong package. The correct package names are llama-parse (deprecated) or llama-cloud (current).
fix Always install: pip install llama-cloud (new) or pip install llama-parse (old/deprecated). Never llama-parser.
gotcha Parsing tiers in v2 are: fast, cost_effective, agentic, agentic_plus. Using any other string (e.g. old v1 mode names) returns: 'Unsupported tier: must be one of: fast, cost_effective, agentic, agentic_plus'.
fix Use only the four valid tier strings. 'fast' is text-only spatial extraction. 'agentic_plus' is highest fidelity for complex layouts.
breaking The 'await' keyword can only be used inside an 'async def' function. Attempting to use 'await' directly in the global scope or a non-async function will result in a 'SyntaxError: 'await' outside function'.
fix Wrap asynchronous calls (e.g., `parser.aload_data()`) in an `async def` function and execute it using `asyncio.run(your_async_function())`. For interactive environments like Jupyter, consider using `nest_asyncio.apply()` at the top of your script/notebook to handle potential event loop conflicts if running `asyncio.run()` multiple times.
pip install llama-parse
pip install llama-cloud-services
python os / libc variant status wheel install import disk
3.10 alpine (musl) llama-cloud - - - -
3.10 alpine (musl) llama-cloud-services - - - -
3.10 alpine (musl) llama-parse - - - -
3.10 slim (glibc) llama-cloud - - - -
3.10 slim (glibc) llama-cloud-services - - - -
3.10 slim (glibc) llama-parse - - - -
3.11 alpine (musl) llama-cloud - - - -
3.11 alpine (musl) llama-cloud-services - - - -
3.11 alpine (musl) llama-parse - - - -
3.11 slim (glibc) llama-cloud - - - -
3.11 slim (glibc) llama-cloud-services - - - -
3.11 slim (glibc) llama-parse - - - -
3.12 alpine (musl) llama-cloud - - - -
3.12 alpine (musl) llama-cloud-services - - - -
3.12 alpine (musl) llama-parse - - - -
3.12 slim (glibc) llama-cloud - - - -
3.12 slim (glibc) llama-cloud-services - - - -
3.12 slim (glibc) llama-parse - - - -
3.13 alpine (musl) llama-cloud - - - -
3.13 alpine (musl) llama-cloud-services - - - -
3.13 alpine (musl) llama-parse - - - -
3.13 slim (glibc) llama-cloud - - - -
3.13 slim (glibc) llama-cloud-services - - - -
3.13 slim (glibc) llama-parse - - - -
3.9 alpine (musl) llama-cloud - - - -
3.9 alpine (musl) llama-cloud-services - - - -
3.9 alpine (musl) llama-parse - - - -
3.9 slim (glibc) llama-cloud - - - -
3.9 slim (glibc) llama-cloud-services - - - -
3.9 slim (glibc) llama-parse - - - -

Cloud API — requires internet and valid API key. Free tier available. For notebooks, call nest_asyncio.apply() before using sync methods or use aload_data() for async.

# NEW API (llama-cloud, v2) — recommended
# pip install llama-cloud
import os
from llama_cloud.services.parse import LlamaParse

parser = LlamaParse(
    api_key=os.environ["LLAMA_CLOUD_API_KEY"],
    tier="cost_effective",  # fast | cost_effective | agentic | agentic_plus
    result_type="markdown",
)

documents = parser.load_data("./my_file.pdf")
print(documents[0].text[:500])

# ---
# OLD API (llama-parse, v1) — deprecated, works until May 2026
# pip install llama-parse
import nest_asyncio
nest_asyncio.apply()  # required in notebooks

from llama_parse import LlamaParse

parser = LlamaParse(
    api_key=os.environ["LLAMA_CLOUD_API_KEY"],
    result_type="markdown",
    num_workers=4,
    verbose=True,
)

documents = parser.load_data("./my_file.pdf")
documents_batch = parser.load_data(["./file1.pdf", "./file2.pdf"])
documents_async = await parser.aload_data("./my_file.pdf")