Llama Cloud Python API library

Name: llama-cloud
Author: Llama Cloud

The Llama Cloud Python library provides convenient access to the Llama Cloud REST API from any Python 3.9+ application. The library includes type definitions for all request params and response fields, and offers both synchronous and asynchronous clients powered by httpx.

It is generated with Stainless.

MCP Server

Use the Llama Cloud MCP Server to enable AI assistants to interact with this API, allowing them to explore endpoints, make test requests, and use documentation to help integrate this SDK into your application.

Note: You may need to set environment variables in your MCP client.

Documentation

The REST API documentation can be found on developers.llamaindex.ai. The full API of this library can be found in api.md.

Installation

# install from PyPI
pip install llama_cloud

Usage

The full API of this library can be found in api.md.

import os
from llama_cloud import LlamaCloud

client = LlamaCloud(
    api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),  # This is the default and can be omitted
)

parsing = client.parsing.create(
    tier="agentic",
    version="latest",
    file_id="abc1234",
)
print(parsing.id)

While you can provide an api_key keyword argument, we recommend using python-dotenv to add LLAMA_CLOUD_API_KEY="My API Key" to your .env file so that your API Key is not stored in source control.

Async usage

Simply import AsyncLlamaCloud instead of LlamaCloud and use await with each API call:

import os
import asyncio
from llama_cloud import AsyncLlamaCloud

client = AsyncLlamaCloud(
    api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),  # This is the default and can be omitted
)


async def main() -> None:
    parsing = await client.parsing.create(
        tier="agentic",
        version="latest",
        file_id="abc1234",
    )
    print(parsing.id)


asyncio.run(main())

Functionality between the synchronous and asynchronous clients is otherwise identical.

With aiohttp

By default, the async client uses httpx for HTTP requests. However, for improved concurrency performance you may also use aiohttp as the HTTP backend.

You can enable this by installing aiohttp:

# install from PyPI
pip install llama_cloud[aiohttp]

Then you can enable it by instantiating the client with http_client=DefaultAioHttpClient():

import os
import asyncio
from llama_cloud import DefaultAioHttpClient
from llama_cloud import AsyncLlamaCloud


async def main() -> None:
    async with AsyncLlamaCloud(
        api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),  # This is the default and can be omitted
        http_client=DefaultAioHttpClient(),
    ) as client:
        parsing = await client.parsing.create(
            tier="agentic",
            version="latest",
            file_id="abc1234",
        )
        print(parsing.id)


asyncio.run(main())

Using types

Nested request parameters are TypedDicts. Responses are Pydantic models which also provide helper methods for things like:

Serializing back into JSON, model.to_json()
Converting to a dictionary, model.to_dict()

Typed requests and responses provide autocomplete and documentation within your editor. If you would like to see type errors in VS Code to help catch bugs earlier, set python.analysis.typeCheckingMode to basic.

Pagination

List methods in the Llama Cloud API are paginated.

This library provides auto-paginating iterators with each list response, so you do not have to request successive pages manually:

from llama_cloud import LlamaCloud

client = LlamaCloud()

all_runs = []
# Automatically fetches more pages as needed.
for run in client.extraction.runs.list(
    extraction_agent_id="30988414-9163-4a0b-a7e0-35dd760109d7",
    limit=20,
    skip=0,
):
    # Do something with run here
    all_runs.append(run)
print(all_runs)

Or, asynchronously:

import asyncio
from llama_cloud import AsyncLlamaCloud

client = AsyncLlamaCloud()


async def main() -> None:
    all_runs = []
    # Iterate through items across all pages, issuing requests as needed.
    async for run in client.extraction.runs.list(
        extraction_agent_id="30988414-9163-4a0b-a7e0-35dd760109d7",
        limit=20,
        skip=0,
    ):
        all_runs.append(run)
    print(all_runs)


asyncio.run(main())

Alternatively, you can use the .has_next_page(), .next_page_info(), or .get_next_page() methods for more granular control working with pages:

first_page = await client.extraction.runs.list(
    extraction_agent_id="30988414-9163-4a0b-a7e0-35dd760109d7",
    limit=20,
    skip=0,
)
if first_page.has_next_page():
    print(f"will fetch next page using these details: {first_page.next_page_info()}")
    next_page = await first_page.get_next_page()
    print(f"number of items we just fetched: {len(next_page.items)}")

# Remove `await` for non-async usage.

Or just work directly with the returned data:

first_page = await client.extraction.runs.list(
    extraction_agent_id="30988414-9163-4a0b-a7e0-35dd760109d7",
    limit=20,
    skip=0,
)

print(
    f"the current start offset for this page: {first_page.skip}"
)  # => "the current start offset for this page: 1"
for run in first_page.items:
    print(run.id)

# Remove `await` for non-async usage.

Nested params

Nested parameters are dictionaries, typed using TypedDict, for example:

from llama_cloud import LlamaCloud

client = LlamaCloud()

parsing = client.parsing.create(
    tier="fast",
    version="2026-01-08",
    agentic_options={},
)
print(parsing.agentic_options)

File uploads

Request parameters that correspond to file uploads can be passed as bytes, or a PathLike instance or a tuple of (filename, contents, media type).

from pathlib import Path
from llama_cloud import LlamaCloud

client = LlamaCloud()

client.files.create(
    file=Path("/path/to/file"),
    purpose="purpose",
)

The async client uses the exact same interface. If you pass a PathLike instance, the file contents will be read asynchronously automatically.

Handling errors

When the library is unable to connect to the API (for example, due to network connection problems or a timeout), a subclass of llama_cloud.APIConnectionError is raised.

When the API returns a non-success status code (that is, 4xx or 5xx response), a subclass of llama_cloud.APIStatusError is raised, containing status_code and response properties.

All errors inherit from llama_cloud.APIError.

import llama_cloud
from llama_cloud import LlamaCloud

client = LlamaCloud()

try:
    client.pipelines.list(
        project_id="my-project-id",
    )
except llama_cloud.APIConnectionError as e:
    print("The server could not be reached")
    print(e.__cause__)  # an underlying Exception, likely raised within httpx.
except llama_cloud.RateLimitError as e:
    print("A 429 status code was received; we should back off a bit.")
except llama_cloud.APIStatusError as e:
    print("Another non-200-range status code was received")
    print(e.status_code)
    print(e.response)

Error codes are as follows:

Status Code	Error Type
400	`BadRequestError`
401	`AuthenticationError`
403	`PermissionDeniedError`
404	`NotFoundError`
422	`UnprocessableEntityError`
429	`RateLimitError`
>=500	`InternalServerError`
N/A	`APIConnectionError`

Retries

Certain errors are automatically retried 2 times by default, with a short exponent

llama-cloud

Description