Project description

Entropic

From chaos, information.

Entropic is a data pipeline framework designed to provide scientists with a simple and efficient way to access data from their experiments. This documentation will guide you through the installation, usage, and customization of the Entropic package.

Requirements

Entropic needs Python 3.8+, and relies mostly on:

Pydantic for data validation.
Pandas for data analysis.

Installation

You can install Entropic using pip:

pip install entropic

Usage

Example

The most basic data pipeline that can be created with entropic consists of a Pipeline subclass which defines the directories containing the experiment results and a function that will be used to read each result file and create a pandas DataFrame from it:

import pandas as pd

from entropic.process import Pipeline
from entropic import results


class Process(Pipeline):
    source_paths = ["experiments/iteration_1", "experiments/iteration_2"]
    extract_with = pd.read_csv


p = Process()
p.run()


if __name__ == "__main__":
    for iteration in results.all:
        for sample in iteration.samples:
            print(sample.data.raw.head())

The main parts from this example are:

Define your data processing class by inheriting from Pipeline:
```
class Process(Pipeline):
    source_paths = ["experiments/iteration_1", "experiments/iteration_2"]
    extract_with = pd.read_csv
```
The source_paths variable points to folders which contain the results for an iteration. Within entropic, an iteration can be thought as a set of initial conditions for which you performed an experiment and took various samples with various results. extract_with defines a function that will read through all of the sample files and create a DataFrame from it. In this example I'm using pandas.read_csv, but it can be any function you want -you can even custom define it and pass it to extract_with.
Instantiate and run the pipeline:
```
p = Process()
p.run()
```

Access your results using the results API:

if __name__ == "__main__":
    for iteration in results.all:
        for sample in iteration.samples:
            print(sample.data.raw.head())

In this example the accessing of results happens on the same file in which you run the pipeline. However, for performance reasons you might want to consider splitting the processing and the analysis on two different files. In this case you only need to run the processing part once, and your data will be loaded to a JSON-based database.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Education
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering
- Utilities

Release history Release notifications | RSS feed

0.3.0

Dec 7, 2023

0.2.0

Dec 4, 2023

This version

0.1.1

Nov 28, 2023

0.1.0

Nov 26, 2023

0.0.1a0 pre-release

Nov 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entropic-0.1.1.tar.gz (16.0 kB view hashes)

Uploaded Nov 28, 2023 Source

Built Distribution

entropic-0.1.1-py3-none-any.whl (8.8 kB view hashes)

Uploaded Nov 28, 2023 Python 3

Hashes for entropic-0.1.1.tar.gz

Hashes for entropic-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`68b5842de6a151a76fbc590aad29f998c372635f6198f54af307f21394159258`
MD5	`45bb77d09258f7aea082e595dde51a41`
BLAKE2b-256	`6c08711db3172eb4217e1734bdbece60a6a3ae99ed811782bc31cf899f7438da`

Hashes for entropic-0.1.1-py3-none-any.whl

Hashes for entropic-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5a8a6118407d9312f3e7b94e409a8e7690c8aa3ad7a72fd8af1d1b260a78334`
MD5	`c1fc88a4e1db2773f248e8e964eb9d66`
BLAKE2b-256	`568f8ccd857cec08a3677d75cbd98d7e838d8a793c98c1eae5ca3a13b1453f18`