Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

gepa

A framework for optimizing textual system components (AI prompts, code snippets, etc.) using LLM-based reflection and Pareto-efficient evolutionary search.

Rank: #1987Downloads: 3,783,128 (30 days)Stars: 2,627Forks: 228

Description

<p align="center"> <img src="https://raw.githubusercontent.com/gepa-ai/gepa/refs/heads/main/assets/gepa_logo_with_text.svg" alt="GEPA Logo" width="450"> </p> <p align="center"> <strong>Optimize any text parameter — prompts, code, agent architectures, configurations — using LLM-based reflection and Pareto-efficient evolutionary search.</strong> </p> <p align="center"> <a href="https://gepa-ai.github.io/gepa/"><strong>Website</strong></a> &ensp;|&ensp; <a href="https://gepa-ai.github.io/gepa/guides/quickstart/"><strong>Quick Start</strong></a> &ensp;|&ensp; <a href="https://arxiv.org/abs/2507.19457"><strong>Paper</strong></a> &ensp;|&ensp; <a href="https://gepa-ai.github.io/gepa/blog/"><strong>Blog</strong></a> &ensp;|&ensp; <a href="https://discord.gg/WXFSeVGdbW"><strong>Discord</strong></a> </p> <p align="center"> <a href="https://pypi.org/project/gepa/"><img src="https://img.shields.io/pypi/v/gepa?logo=python&logoColor=white&color=3776ab" alt="PyPI"></a> <a href="https://pepy.tech/projects/gepa"><img src="https://static.pepy.tech/badge/gepa" alt="Downloads"></a> <a href="https://github.com/gepa-ai/gepa"><img src="https://img.shields.io/github/stars/gepa-ai/gepa?style=flat&logo=github&color=181717" alt="GitHub stars"></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green?style=flat" alt="License"></a> </p> <p align="center"> <a href="https://join.slack.com/t/gepa-ai/shared_invite/zt-3o352xhyf-QZDfwmMpiQjsvoSYo7M1_w"><img src="https://badgen.net/badge/icon/Slack?icon=slack&label&color=4A154B" alt="Slack"></a> <a href="https://discord.gg/WXFSeVGdbW"><img src="https://dcbadge.limes.pink/api/server/https://discord.gg/WXFSeVGdbW?style=flat" alt="Discord"></a> </p>

What is GEPA?

GEPA (Genetic-Pareto) is a framework for optimizing any system with textual parameters against any evaluation metric. Unlike RL or gradient-based methods that collapse execution traces into a single scalar reward, GEPA uses LLMs to read full execution traces — error messages, profiling data, reasoning logs — to diagnose why a candidate failed and propose targeted fixes. Through iterative reflection, mutation, and Pareto-aware selection, GEPA evolves high-performing variants with minimal evaluations.

If you can measure it, you can optimize it: prompts, code, agent architectures, scheduling policies, vector graphics, and more.

Key Results

90x cheaperOpen-source models + GEPA beat Claude Opus 4.1 at Databricks
35x faster than RL100–500 evaluations vs. 5,000–25,000+ for GRPO (paper)
32% → 89%ARC-AGI agent accuracy via architecture discovery
40.2% cost savingsCloud scheduling policy discovered by GEPA, beating expert heuristics
55% → 82%Coding agent resolve rate on Jinja via auto-learned skills
50+ production usesAcross Shopify, Databricks, Dropbox, OpenAI, Pydantic, MLflow, Comet ML, and more

"Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world"Tobi Lutke, CEO, Shopify


Installation

pip install gepa

To install the latest from main:

pip install git+https://github.com/gepa-ai/gepa.git

Quick Start

Simple Prompt Optimization

Optimize a system prompt for math problems from the AIME benchmark in a few lines of code (full tutorial):

import gepa

trainset, valset, _ = gepa.examples.aime.init_dataset()

seed_prompt = {
    "system_prompt": "You are a helpful assistant. Answer the question. "
                     "Put your final answer in the format '### <answer>'"
}

result = gepa.optimize(
    seed_candidate=seed_prompt,
    trainset=trainset,
    valset=valset,
    task_lm="openai/gpt-4.1-mini",
    max_metric_calls=150,
    reflection_lm="openai/gpt-5",
)

print("Optimized prompt:", result.best_candidate['system_prompt'])

Result: GPT-4.1 Mini goes from 46.6% → 56.6% on AIME 2025 (+10 percentage points).

With DSPy (Recommended for AI Pipelines)

The most powerful way to use GEPA for prompt optimization is within DSPy, where it's available as dspy.GEPA. See dspy.GEPA tutorials for executable notebooks.

import dspy

optimizer = dspy.GEPA(
    metric=your_metric,
    max_metric_calls=150,
    reflection_lm="openai/gpt-5",
)
optimized_program = optimizer.compile(student=MyProgram(), trainset=trainset, valset=valset)

optimize_anything: Beyond Prompts

The optimize_anything API optimizes any text artifact — code, agent architectures, configurations, SVGs — not just prompts. You provide an evaluator; the system handles the search.

import gepa.optimize_anything as oa
from gepa.optimize_anything import optimize_anything, GEPAConfig, EngineConfig

def evaluate(candidate: str) -> float:
    result = run_my_system(candidate)
    oa.log(f"Output: {result.output}")      # Actionable Side Information
    oa.log(f"Error: {result.error}")         # feeds back into reflection
    return result.score

result = optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
    objective="Describe what you want to optimize for.",
    config=GEPAConfig(engine=EngineConfig(max_metric_calls=100)),
)

How It Works

Traditional optimizers know that a candidate failed but not why. GEPA takes a different approach:

  1. Select a candidate from the Pareto frontier (candidates excelling on different task subsets)
  2. Execute on a minibatch, capturing full execution traces
  3. Reflect — an LLM reads the traces (error messages, profiler output, reasoning logs) and diagnoses failures
  4. Mutate — generate an improved candidate informed by accumulated lessons from all ancestors
  5. Accept — add to the pool if improved, update the Pareto front

GEPA also supports system-aware merge — combining strengths of two Pareto-optimal candidates excelling on different tasks. The key concept is Actionable Side Information (ASI): diagnostic feedback returned by evaluators that serves as the text-optimization analogue of a gradient.

For details, see the paper and the documentation.


Adapters: Plug GEPA into Any System

GEPA connects to your system via the GEPAAdapter interface — implement evaluate and make_reflective_dataset, and GEPA handles the rest.

Built-in adapters:

AdapterDescription
DefaultAdapterSystem prompt optimization for single-turn LLM tasks
DSPy Full ProgramEvolves entire DSPy programs (signatures, modules, control flow). 67% → 93% on MATH.
Generic RAGVector store-agnostic RAG optimization (ChromaDB, Weaviate, Qdrant, Pinecone)
MCP AdapterOptimize MCP tool descriptions and system prompts
TerminalBenchOptimize the Terminus terminal-use agent
AnyMathsMathematical problem-solving and reasoning tasks

See the adapters guide for how to build your own, and DSPy's adapter as a reference.


Integrations

GEPA is integrated into several major frameworks:

  • DSPydspy.GEPA for optimizing DSPy programs. Tutorials.
  • MLflowmlflow.genai.optimize_prompts() for automatic prompt improvement.
  • Comet ML Opik — Core optimization algorithm in Opik Agent Optimizer.
  • Pydantic — Prompt optimization for Pydantic AI.
  • OpenAI Cookbook — Self-evolving agents with GEPA.
  • HuggingFace Cookbook — Prompt optimization guide.
  • Google ADK — Optimizing Google Agent Development Kit agents.

Example Optimized Prompts

GEPA can be thought of as precomputing reasoning during optimization to produce a plan for future task instances. Here are examples of the detailed prompts GEPA discovers:

<table> <tr> <td colspan="2" align="center">Example GEPA Prompts</td> </tr> <tr> <td align="center">HotpotQA (multi-hop QA) Prompt</td> <td align="center">AIME Prompt</td> </tr> <tr> <td width="52%" valign="top"> <img src="https://raw.githubusercontent.com/gepa-ai/gepa/refs/heads/main/assets/gepa_prompt_hotpotqa.png" alt="HotpotQA Prompt" width="1400"> <!-- <td> --> <details> <summary><mark>Click to view full HotpotQA prompt</mark><