Skip to main content

Useful functions, classes and tools for handling and interacting with dataframes.

Project description

Latest Version image License Python Versions CI LINTER Coverage

Motivation

The dataframing package provides useful functions to use dataframes.

Data transformation

The main goal is to allow you to transforme a dataframe structure into another in a way which is easy to use, understand how structures are connected and allows you to work with typing.

It assumes that you have a Protocol defining your dataframes (which by the way is a convenient thing to do!). For example:

>>> from typing import Protocol
>>> import dataframing as dfr
>>>
>>> class Original(Protocol):
...     last_name: str
...     first_name: str
>>>
>>> class Modified(Protocol):
...    full_name: str

Now we build a transformer that connects Original and Modified

>>> with dfr.morph(Original, Modified) as (ori2mod, source, target):
...    target.full_name = dfr.wrap("{}, {}".format, source.last_name, source.first_name)

And now is ready to use!

>>> row = dict(last_name="Cleese", first_name="John")
>>> ori2mod.transform_record(row)
{'full_name': 'Cleese, John'}

Notice that we are demonstrating this with a dictionary but it will work this with a dataframe row, or a full dataframe (or iterable of dicts).

>>> data = [
...   dict(last_name="Cleese", first_name="John"),
...   dict(last_name="Gilliam", first_name="Terry")
...   ]
>>> ori2mod.transform_collection(data)
[{'full_name': 'Cleese, John'}, {'full_name': 'Gilliam, Terry'}]

If you are going to use a particular function a lot, you can wrap it once and use it multiple times. This also helps to keep the converter visually clean.

>>> fullnamer = dfr.wrap("{}, {}".format)
>>> with dfr.morph(Original, Modified) as (ori2mod, source, target):
...    target.full_name = fullnamer(source.last_name, source.first_name)

To show case how to create two columns from one, we are going to build the reverse transformer.

>>> def splitter(s: str) -> tuple[str, str]:
...     part1, part2 = s.split(",")
...     return part1.strip(), part2.strip()
>>> namesplitter = dfr.wrap(splitter)
>>> with dfr.morph(Modified, Original) as (mod2ori, source, target):
...    target.last_name, target.first_name = namesplitter(source.full_name)
>>>
>>> row = dict(full_name="Cleese, John")
>>> mod2ori.transform_record(row)
{'last_name': 'Cleese', 'first_name': 'John'}

Input/Output

You can also use it to save and load data.

>>> dfr.save(my_dataframe, "example.xlsx") # doctest: +SKIP
>>> df = dfr.load("example.xlsx") # doctest: +SKIP

Why using this instead of the standard pandas.to_excel? save does two extra things:

  1. Stores the metadata stored in my_dataframe.attrs from/into another sheet.
  2. Calculates a hash for the data and metadata and store it in the metadata sheet.

Loads will compare the data content with the stored hash. This behaviour is useful for data validation, but can be disable with use_hash keyword argument.

Another useful pair of functions are load_many, save_many

>>> dfr.save_many(dict(raw_data=raw_data, processed_data=processed_data), "example.xlsx") # doctest: +SKIP
>>> dfdict = dfr.load_many("example.xlsx") # doctest: +SKIP

in which the input and output are dictionaries that allows you to group into a single excel file multiple dataframes.

Installation

Just install it using:

pip install dataframing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframing-0.1rc2.tar.gz (12.9 kB view hashes)

Uploaded Source

Built Distribution

dataframing-0.1rc2-py3-none-any.whl (9.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page