Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

pypdfium2

Python bindings to PDFium

Downloads: 0 (30 days)

Description

<!-- SPDX-FileCopyrightText: 2026 geisserml <geisserml@gmail.com> --> <!-- SPDX-License-Identifier: CC-BY-4.0 -->

pypdfium2

<!-- [![Downloads](https://pepy.tech/badge/pypdfium2/month)](https://pepy.tech/project/pypdfium2) -->

pypdfium2 is an ABI-level Python 3 binding to PDFium, a powerful and liberal-licensed library for PDF rendering, inspection, manipulation and creation.

It is built with ctypesgen and external PDFium binaries. The custom setup infrastructure provides a seamless packaging and installation process. A wide range of platforms is supported with pre-built packages.

pypdfium2 includes helpers to simplify common use cases, while the raw PDFium API (ctypes) remains accessible as well.

Installation

From PyPI (recommended)

python -m pip install -U pypdfium2

If available for your platform, this will use a pre-built wheel package, which is the easiest way of installing pypdfium2. Otherwise, setup code will run. If your platform is not covered with pre-built binaries, this will look for system pdfium, or attempt to build pdfium from source.

JavaScript/XFA builds

pdfium-binaries also offer V8 (JavaScript) / XFA enabled builds. If you need them, do e.g.:

PDFIUM_PLATFORM=auto-v8 pip install -v pypdfium2 --no-binary pypdfium2

This will bypass wheels and run setup, while requesting use of V8 builds through the PDFIUM_PLATFORM=auto-v8 environment setting. See below for more info.

Optional runtime dependencies

As of this writing, pypdfium2 does not require any mandatory runtime dependencies, apart from Python and PDFium itself (which is commonly bundled).

However, some optional support model / CLI features need additional packages:

  • Pillow (module PIL) is a pouplar imaging library for Python. pypdfium2 provides convenience adapters to translate between raw bitmap buffers and PIL images. It also uses PIL for some command-line functionality (e.g. image saving).
  • NumPy is a library for scientific computing. As with Pillow, pypdfium2 provides helpers to get a numpy array view of a raw bitmap.
  • opencv-python (module cv2) is an imaging library built around numpy arrays. It can be used in the rendering CLI to save with pypdfium2's numpy adapter.

pypdfium2 tries to defer imports of optional dependencies until they are actually needed, so there should be no startup overhead if you don't use them.

From the repository / With setup

Note, unlike helpers, pypdfium2's setup is not bound by API stability promises, so it may change any time.

Setup Dependencies

System

  • C pre-processor (gcc/clang – alternatively, specify the command to invoke via $CPP)
  • git (Used e.g. to determine the latest pdfium-binaries version, to get git describe info, or to check out pdfium on sourcebuild. Might be optional on default setup.)
  • gh >= 2.47.0 (optional; used to verify pdfium-binaries build attestations)

Python

Python dependencies should be installed automatically, unless --no-build-isolation is passed to pip.

[!NOTE] pypdfium2 and its ctypesgen fork are developed in sync, i.e. each pypdfium2 commit ought to be coupled with the then HEAD of pypdfium2-ctypesgen.<br> Our release sdists, and latest pypdfium2 from git, will automatically use matching ctypesgen.<br> However, when using a non-latest commit, you'll have to set up the right ctypesgen version on your own, and install pypdfium2 without build isolation.

Get the code

git clone "https://github.com/pypdfium2-team/pypdfium2.git"
cd pypdfium2/

Default setup

# In the pypdfium2/ directory
python -m pip install -v .

This will invoke pypdfium2's setup.py. Typically, this means a binary will be downloaded from pdfium-binaries and bundled into pypdfium2, and ctypesgen will be called on pdfium headers to produce the bindings interface.

pdfium-binaries offer GitHub build provenance attestations, so it is highly recommended that you install the gh CLI for our setup to verify authenticity of the binaries.

If no pre-built binaries are available for your platform, setup will look for system pdfium, or attempt to build pdfium from source.

pip options of interest
  • -v: Verbose logging output. Useful for debugging.
  • -e: Install in editable mode, so the installation points to the source tree. This way, changes directly take effect without needing to re-install. Recommended for development.
  • --no-build-isolation: Do not isolate setup in a virtual env; use the main env instead. This renders pyproject.toml [build-system] inactive, so setup deps must be prepared by caller. Useful to install custom versions of setup deps, or as speedup when installing repeatedly.
  • --no-binary pypdfium2: Do not use binary wheels when installing from PyPI – instead, use the sdist and run setup. Note, this option is improperly named, as pypdfium2's setup will attempt to use binaries all the same. If you want to prevent that, set e.g. PDFIUM_PLATFORM=fallback to achieve the same behavior as if there were no pdfium-binaries for the host. Or if you just want to package a source distribution, set PDFIUM_PLATFORM=sdist.
  • --pre to install a beta release, if available.

With system pdfium

PDFIUM_PLATFORM="system-search" python -m pip install -v .

Look for a system-provided pdfium shared library, and bind against it.

Standard, portable ctypes.util.find_library() means will be used to probe for system pdfium at setup time, and the result will be hardcoded into the bindings. Alternatively, set $PDFIUM_BINARY to the path of the out-of-tree DLL to use.

If system pdfium was found, we will look for pdfium headers from which to generate the bindings (e.g. in /usr/include). If the headers are in a location not recognized by our code, set $PDFIUM_HEADERS to the directory in question.

Also, we try to determine the pdfium version, either from the library filename itself, or via pkg-config. If this fails, you can pass the version alongside the setup target, e.g. PDFIUM_PLATFORM=system-search:XXXX, where XXXX is the pdfium build version. If the version is not known in the end, NaN placeholders will be set.

If the version is known but no headers were found, they will be downloaded from upstream. If neither headers nor version are known (or ctypesgen is not installed), the reference bindings will be used as a last resort. This is ABI-unsafe and thus discouraged.

If find_library() failed to find pdfium, we may do additional, custom search, such as checking for a pdfium shared library included with LibreOffice, and – if available – determining its version.<br> Our search heuristics currently expect a Linux-like filesystem hierarchy (e.g. /usr), but contributions for other systems are welcome.

[!IMPORTANT] When pypdfium2 is installed with system pdfium, the bindings ought to be re-generated with the new headers whenever the out-of-tree pdfium DLL is updated, for ABI safety reasons.1<br> For distributors, we highly recommend the use of versioned libraries (e.g. libpdfium.so.140.0.7269.0) or similar concepts that enforce binary/bindings version match, so outdated bindings will safely stop working with a meaningful error, rather than silently continue unsafely, at risk of hard crashes.

[!TIP] If you mind pypdfium2's setup making a web request to resolve the full version, you may pass it in manually via GIVEN_FULLVER=$major.$minor.$build.$patch (colon-separated if there are multiple versions), or less ideally, set IGNORE_FULLVER=1 to use NaN placeholders. This applies to other setup targets as well.<br> For distributors, we recommend that you use the full version in binary filename or pkgconfig info, so pypdfium2's setup will not need to resolve it in the first place.

Related targets

There is also a system-generate:$VERSION target, to produce system pdfium bindings in a host-independent fashion. This will call find_library() at runtime, and may be useful for packaging.

Further, you can set just system to consume pre-generated files from the data/system staging directory. See the section on caller-provided data files for more info.

With self-built pdfium

You can also install pypdfium2 with a self-compiled pdfium shared library, by placing it in data/sourcebuild/ along with a bindings interface and version info, and setting the PDFIUM_PLATFORM="sourcebuild" directive to use these files on setup.

This project comes with two scripts to automate the build process: build_toolchained.py and build_native.py (in setupsrc/).

  • build_toolchained is based on the build instructions in pdfium's Readme, and uses Google's toolchain (this means foreign binaries and s

Footnotes

  1. Luckily, upstream tend to be careful not to change the ABI of existing stable APIs, but they don't mind ABI-breaking changes to APIs that have not been promoted to stable tier yet, and pypdfium2 uses many of them, so it is still prudent to care about downstream ABI safety as well (it always is). You can read more about upstream's policy here.