Useful functions, classes and tools for handling and interacting with dataframes.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Motivation

The dataframing package provides useful functions to use dataframes.

Data transformation

The main goal is to allow you to transforme a dataframe structure into another in a way which is easy to use, understand how structures are connected and allows you to work with typing.

It assumes that you have a Protocol defining your dataframes (which by the way is a convenient thing to do!). For example:

>>> from typing import Protocol
>>> import dataframing as dfr
>>>
>>> class Original(Protocol):
...     last_name: str
...     first_name: str
>>>
>>> class Modified(Protocol):
...    full_name: str

Now we build a transformer that connects Original and Modified

>>> with dfr.morph(Original, Modified) as (ori2mod, source, target):
...    target.full_name = dfr.wrap("{}, {}".format, source.last_name, source.first_name)

And now is ready to use!

>>> row = dict(last_name="Cleese", first_name="John")
>>> ori2mod.transform_record(row)
{'full_name': 'Cleese, John'}

Notice that we are demonstrating this with a dictionary but it will work this with a dataframe row, or a full dataframe (or iterable of dicts).

>>> data = [
...   dict(last_name="Cleese", first_name="John"),
...   dict(last_name="Gilliam", first_name="Terry")
...   ]
>>> ori2mod.transform_collection(data)
[{'full_name': 'Cleese, John'}, {'full_name': 'Gilliam, Terry'}]

If you are going to use a particular function a lot, you can wrap it once and use it multiple times. This also helps to keep the converter visually clean.

>>> fullnamer = dfr.wrap("{}, {}".format)
>>> with dfr.morph(Original, Modified) as (ori2mod, source, target):
...    target.full_name = fullnamer(source.last_name, source.first_name)

To show case how to create two columns from one, we are going to build the reverse transformer.

>>> def splitter(s: str) -> tuple[str, str]:
...     part1, part2 = s.split(",")
...     return part1.strip(), part2.strip()
>>> namesplitter = dfr.wrap(splitter)
>>> with dfr.morph(Modified, Original) as (mod2ori, source, target):
...    target.last_name, target.first_name = namesplitter(source.full_name)
>>>
>>> row = dict(full_name="Cleese, John")
>>> mod2ori.transform_record(row)
{'last_name': 'Cleese', 'first_name': 'John'}

Input/Output

You can also use it to save and load data.

>>> dfr.save(my_dataframe, "example.xlsx") # doctest: +SKIP
>>> df = dfr.load("example.xlsx") # doctest: +SKIP

Why using this instead of the standard pandas.to_excel? save does two extra things:

Stores the metadata stored in my_dataframe.attrs from/into another sheet.
Calculates a hash for the data and metadata and store it in the metadata sheet.

Loads will compare the data content with the stored hash. This behaviour is useful for data validation, but can be disable with use_hash keyword argument.

Another useful pair of functions are load_many, save_many

>>> dfr.save_many(dict(raw_data=raw_data, processed_data=processed_data), "example.xlsx") # doctest: +SKIP
>>> dfdict = dfr.load_many("example.xlsx") # doctest: +SKIP

in which the input and output are dictionaries that allows you to group into a single excel file multiple dataframes.

Installation

Just install it using:

pip install dataframing

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1rc2 pre-release

Nov 30, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframing-0.1rc2.tar.gz (12.9 kB view hashes)

Uploaded Nov 30, 2023 Source

Built Distribution

dataframing-0.1rc2-py3-none-any.whl (9.7 kB view hashes)

Uploaded Nov 30, 2023 Python 3

Hashes for dataframing-0.1rc2.tar.gz

Hashes for dataframing-0.1rc2.tar.gz
Algorithm	Hash digest
SHA256	`338a99a20dea883420c838bfed2e1ebac663bdec426e255d95910a7d81e2f88d`
MD5	`70242603e9392f92dcf572e64d9df6a4`
BLAKE2b-256	`3c52a6c879801147de1f3409afe3572e62d86b122f902814cdca481b30dc96c8`

Hashes for dataframing-0.1rc2-py3-none-any.whl

Hashes for dataframing-0.1rc2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`493bd85eb3d33a50fa5b1bfb8b276c979a65444129527e81362a37bc40c63479`
MD5	`5add8a4d0a0c119ba745fe12976c119e`
BLAKE2b-256	`33f80ab16a4f8ec2a21bfcf10fd2c806bcee6ebb83985be4c657c79c97858ca4`