Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

json-repair

A package to repair broken json strings

Rank: #1006Downloads: 13,445,262 (30 days)Stars: 4,557Forks: 174

Description

PyPI Python version PyPI downloads PyPI Downloads Github Sponsors GitHub Repo stars

English | 中文

This simple package can be used to fix an invalid json string. To know all cases in which this package will work, check out the unit test.

banner


Think about sponsoring this library!

This library is free for everyone and it's maintained and developed as a side project so, if you find this library useful for your work, consider becoming a sponsor via this link: https://github.com/sponsors/mangiucugna

Premium sponsors


Demo

If you are unsure if this library will fix your specific problem, or simply want your json validated online, you can visit the demo site on GitHub pages: https://mangiucugna.github.io/json_repair/

Or hear an audio deepdive generate by Google's NotebookLM for an introduction to the module


Motivation

Some LLMs are a bit iffy when it comes to returning well formed JSON data, sometimes they skip a parentheses and sometimes they add some words in it, because that's what an LLM does. Luckily, the mistakes LLMs make are simple enough to be fixed without destroying the content.

I searched for a lightweight python package that was able to reliably fix this problem but couldn't find any.

So I wrote one

Supported use cases

Fixing Syntax Errors in JSON

  • Missing quotes, misplaced commas, unescaped characters, and incomplete key-value pairs.
  • Missing quotation marks, improperly formatted values (true, false, null), and repairs corrupted key-value structures.

Repairing Malformed JSON Arrays and Objects

  • Incomplete or broken arrays/objects by adding necessary elements (e.g., commas, brackets) or default values (null, "").
  • The library can process JSON that includes extra non-JSON characters like comments or improperly placed characters, cleaning them up while maintaining valid structure.

Auto-Completion for Missing JSON Values

  • Automatically completes missing values in JSON fields with reasonable defaults (like empty strings or null), ensuring validity.

How to use

Install the library with pip

pip install json-repair

then you can use use it in your code like this

from json_repair import repair_json

good_json_string = repair_json(bad_json_string)
# If the string was super broken this will return an empty string

You can use this library to completely replace json.loads():

import json_repair

decoded_object = json_repair.loads(json_string)

or just

import json_repair

decoded_object = json_repair.repair_json(json_string, return_objects=True)

Avoid this antipattern

Some users of this library adopt the following pattern:

obj = {}
try:
    obj = json.loads(string)
except json.JSONDecodeError as e:
    obj = json_repair.loads(string)
    ...

This is wasteful because json_repair will already verify for you if the JSON is valid, if you still want to do that then add skip_json_loads=True to the call as explained the section below.

Read json from a file or file descriptor

JSON repair provides also a drop-in replacement for json.load():

import json_repair

try:
    file_descriptor = open(fname, 'rb')
except OSError:
    ...

with file_descriptor:
    decoded_object = json_repair.load(file_descriptor)

and another method to read from a file:

import json_repair

try:
    decoded_object = json_repair.from_file(json_file)
except OSError:
    ...
except IOError:
    ...

Keep in mind that the library will not catch any IO-related exception and those will need to be managed by you

Non-Latin characters

When working with non-Latin characters (such as Chinese, Japanese, or Korean), you need to pass ensure_ascii=False to repair_json() in order to preserve the non-Latin characters in the output.

Here's an example using Chinese characters:

repair_json("{'test_chinese_ascii':'统一码'}")

will return

{"test_chinese_ascii": "\u7edf\u4e00\u7801"}

Instead passing ensure_ascii=False:

repair_json("{'test_chinese_ascii':'统一码'}", ensure_ascii=False)

will return

{"test_chinese_ascii": "统一码"}

JSON dumps parameters

More in general, repair_json will accept all parameters that json.dumps accepts and just pass them through (for example indent)

Performance considerations

If you find this library too slow because is using json.loads() you can skip that by passing skip_json_loads=True to repair_json. Like:

from json_repair import repair_json

good_json_string = repair_json(bad_json_string, skip_json_loads=True)

I made a choice of not using any fast json library to avoid having any external dependency, so that anybody can use it regardless of their stack.

Some rules of thumb to use:

  • Setting return_objects=True will always be faster because the parser returns an object already and it doesn't have serialize that object to JSON
  • skip_json_loads is faster only if you 100% know that the string is not a valid JSON
  • If you are having issues with escaping pass the string as raw string like: r"string with escaping\""

Strict mode

By default json_repair does its best to “fix” input, even when the JSON is far from valid.
In some scenarios you want the opposite behavior and need the parser to error out instead of repairing; pass strict=True to repair_json, loads, load, or from_file to enable that mode:

from json_repair import repair_json

repair_json(bad_json_string, strict=True)

The CLI exposes the same behavior with json_repair --strict input.json (or piping data via stdin).

In strict mode the parser raises ValueError as soon as it encounters structural issues such as duplicate keys, missing : separators, empty keys/values introduced by stray commas, multiple top-level elements, or other ambiguous constructs. This is useful when you just need validation with friendlier error messages while still benefiting from json_repair’s resilience elsewhere in your stack.

Strict mode still honors skip_json_loads=True; combining them lets you skip the initial json.loads check but still enforce strict parsing rules.

Schema-guided repairs

Schema-guided repairs are currently considered in beta. Bugs are to be expected.

You can guide repairs with a JSON Schema (or a Pydantic v2 model). When enabled, the parser will:

  • Fill missing values (defaults, required fields).
  • Coerce scalars where safe (e.g., "1"1 for integer fields, and "yes"/"no"/1/0 for booleans).
  • Drop properties/items that the schema disallows.

Schema mode can be selected with schema_repair_mode:

  • standard (default): existing schema-guided behavior.
  • salvage: includes standard and also:
    • drops invalid array items when individual items cannot be repaired;
    • maps arrays to objects by property order when schema/object shape is unambiguous.
    • unwraps a root single-item array to an object when the root schema expects an object ([{...}] -> {...});
    • fills missing required fields only when a safe value can be inferred (default, const, first enum, or empty array/object when allowed by schema constraints).

This is especially useful when you need deterministic, schema-valid outputs for downstream validation, storage, or typed processing. If the input cannot be repaired to satisfy the schema, json_repair raises ValueError.

Install the optional dependencies:

pip install 'json-repair[schema]'

(For CLI usage, you can also use pipx install 'json-repair[schema]'.)

When schema is provided, schema guidance is always applied (for both valid and invalid JSON). Schema guidance is mutually exclusive with strict=True.

from json_repair import repair_json

schema = {
    "type": "object",
    "properties": {"value": {"type": "integer"}},
    "required": ["value"],
}

repair_json('{"value": "1"}', schema=schema, return_objects=True)

repair_json(
    '{"items":[{"id":1,"score":85.6},{"id":2,"score":"N/A"}]}',
    schema={
        "type": "object",
        "properties": {
            "items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {"id": {"type": "integer"}, "score": {"type": "number"}},
                    "required": ["id", "score"],
                },
            }
        },
        "required": ["items"],
    },
    schema_repair_mode="salvage",
    return_objects=True,
)

Pydantic v2 model example:

from pydantic import BaseModel, Field
from json_repair import repair_json


class Payload(BaseModel):
    value: int
    tags: list[str] = Field(default_factory=list)


repair_json(
    '{"value": "1", "tags": }',
    schema=Payload,
    skip_json_loads=True,
    return_objects=True,
)

Use json_repair with streaming

Sometimes you are streaming some data and want to repair the JSON coming from it. Normally this won't work but you can pass stream_stable to repair_json() or loads() to make it work:

stream_output = repair_json(stream_input, stream_stable=True)

Use json_repair fr