Skip to main content

Fast iterable JSON parser.

Project description

jiter

CI pypi versions license

This is a standalone version of the JSON parser used in pydantic-core. The recommendation is to only use this package directly if you do not use pydantic.

The API is extremely minimal:

def from_json(
    json_data: bytes,
    /,
    *,
    allow_inf_nan: bool = True,
    cache_mode: Literal[True, False, "all", "keys", "none"] = "all",
    partial_mode: Literal[True, False, "off", "on", "trailing-strings"] = False,
    catch_duplicate_keys: bool = False,
    float_mode: Literal["float", "decimal", "lossless-float"] = False,
) -> Any:
    """
    Parse input bytes into a JSON object.

    Arguments:
        json_data: The JSON data to parse
        allow_inf_nan: Whether to allow infinity (`Infinity` an `-Infinity`) and `NaN` values to float fields.
            Defaults to True.
        cache_mode: cache Python strings to improve performance at the cost of some memory usage
            - True / 'all' - cache all strings
            - 'keys' - cache only object keys
            - False / 'none' - cache nothing
        partial_mode: How to handle incomplete strings:
            - False / 'off' - raise an exception if the input is incomplete
            - True / 'on' - allow incomplete JSON but discard the last string if it is incomplete
            - 'trailing-strings' - allow incomplete JSON, and include the last incomplete string in the output
        catch_duplicate_keys: if True, raise an exception if objects contain the same key multiple times
        float_mode: How to return floats: as a `float`, `Decimal` or `LosslessFloat`

    Returns:
        Python object built from the JSON input.
    """

def cache_clear() -> None:
    """
    Reset the string cache.
    """

def cache_usage() -> int:
    """
    get the size of the string cache.

    Returns:
        Size of the string cache in bytes.
    """

Examples

The main function provided by Jiter is from_json(), which accepts a bytes object containing JSON and returns a Python dictionary, list or other value.

import jiter

json_data = b'{"name": "John", "age": 30}'
parsed_data = jiter.from_json(json_data)
print(parsed_data)  # Output: {'name': 'John', 'age': 30}

Handling Partial JSON

Incomplete JSON objects can be parsed using the partial_mode= parameter.

import jiter

partial_json = b'{"name": "John", "age": 30, "city": "New Yor'

# Raise error on incomplete JSON
try:
    jiter.from_json(partial_json, partial_mode=False)
except ValueError as e:
    print(f"Error: {e}")

# Parse incomplete JSON, discarding incomplete last field
result = jiter.from_json(partial_json, partial_mode=True)
print(result)  # Output: {'name': 'John', 'age': 30}

# Parse incomplete JSON, including incomplete last field
result = jiter.from_json(partial_json, partial_mode='trailing-strings')
print(result)  # Output: {'name': 'John', 'age': 30, 'city': 'New Yor'}

Catching Duplicate Keys

The catch_duplicate_keys=True option can be used to raise a ValueError if an object contains duplicate keys.

import jiter

json_with_dupes = b'{"foo": 1, "foo": 2}'

# Default behavior (last value wins)
result = jiter.from_json(json_with_dupes)
print(result)  # Output: {'foo': 2}

# Catch duplicate keys
try:
    jiter.from_json(json_with_dupes, catch_duplicate_keys=True)
except ValueError as e:
    print(f"Error: {e}")

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page