Skip to main content

No project description provided

Project description

duper

🚧 Status

Library for building fast and reusable copying factories for python objects.

Aims to fill the gaps in performance and obscurity between copy, pickle, json and other serialization libraries, becoming the go-to library for copying objects within the same Python process.


Why?

It is challenging and fun, of course.

But if I'm being serious, deepcopy is extremely slow and there's no alternative that is both faster and can replace deepcopy in all cases.

Keypoints

  • Generates a cook-book to reconstruct given object
  • Upon subsequent calls, follows optimized instructions to produce new object
  • Much faster handling of immutable types and flat collections.

How fast?

Generally 20-50 times faster than copy.deepcopy() on nested objects.

import duper
import copy
from timesup import timesup


@timesup(number=100000, repeats=3)
def reconstruction():
    x = {"a": 1, "b": [(1, 2, 3), (4, 5, 6)], "c": []}

    copy.deepcopy(x)      # ~0.00605 ms (deepcopy)
    dup = duper.Duper(x)  # ~0.00009 ms (duper_init):
    dup.deep()            # ~0.00014 ms (duper_dup): 42.22 times faster than deepcopy

Real use case

Pydantic

Models definition
from datetime import datetime
from functools import wraps

import duper
from pydantic import BaseModel, Field
from pydantic.fields import FieldInfo


class User(BaseModel):
    id: int
    name: str = "John Doe"
    signup_ts: datetime | None = None
    friends: list[int] = []
    skills: dict[str, int] = {
        "foo": {"count": 4, "size": None},
        "bars": [
            {"apple": "x1", "banana": "y"},
            {"apple": "x2", "banana": "y"},
        ],
    }



@wraps(Field)
def FastField(default, *args, **kwargs):
    """
    Overrides the fields that need to be copied to have default_factories
    """    
    default_factory = duper.Duper(default, prepare=True).deep
    field_info: FieldInfo = Field(*args, default_factory=default_factory, **kwargs)
    return field_info


class FastUser(BaseModel):
    id: int
    name: str = FastField("John Doe")
    signup_ts: datetime | None = FastField(None)
    friends: list[int] = FastField([])
    skills: dict[str, int] = FastField(
        {
            "foo": {"count": 4, "size": None},
            "bars": [
                {"apple": "x1", "banana": "y"},
                {"apple": "x2", "banana": "y"},
            ],
        }
    )
@timesup(number=100000, repeats=3)
def pydantic_defaults():
    User(id=42)        # ~0.00935 ms (with_deepcopy)
    FastUser(id=1337)  # ~0.00292 ms (with_duper): 3.20 times faster than with_deepcopy

🚧 Status

Though the library is in an early development stage, it already outperforms all other solutions I've found when copying objects.

I am completing the implementation and exploring new and validating existing ideas to improve performance.

My current priority is to speed up the initial build of the copying factory. It is currently slightly slower than deepcopy in most cases.

If you're interested in this project, you can contact me via bobronium@gmail.com or Telegram.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duper-0.0.1.tar.gz (3.8 kB view hashes)

Uploaded Source

Built Distributions

duper-0.0.1-py3-none-any.whl (3.9 kB view hashes)

Uploaded Python 3

duper-0.0.1-cp310-cp310-macosx_10_16_arm64.whl (132.5 kB view hashes)

Uploaded CPython 3.10 macOS 10.16+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page